Conference Paper

Towards Open Natural Language Feedback Generation for Novice Programmers using Large Language Models

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Much research has been done in automatic generation of formative feedback and reference solutions to code produced by students (Keuning et al. 2018, Koutcheme 2022, Ta et al. 2022. Such feedback and reference solutions could be generated by current pre-trained LLMs to help novice programmers know how to proceed when facing coding issues. ...
Article
In introductory programming courses, students as novice programmers would benefit from doing frequent practices set at a difficulty level and concept suitable for their skills and knowledge. However, setting many good programming exercises for individual learners is very time-consuming for instructors. In this work, we propose an automated exercise generation system, named ExGen, which leverages recent advances in pre-trained large language models (LLMs) to automatically create customized and ready-to-use programming exercises for individual students on- demand. The system integrates seamlessly with Visual Studio Code, a popular development environment for computing students and software engineers. ExGen effectively does the following: 1) maintaining a set of seed exercises in a personalized database stored locally for each student; 2) constructing appropriate prompts from the seed exercises to be sent to a cloud-based LLM deployment for generating candidate exercises; and 3) implementing a novel combination of filtering checks to automatically select only ready-to-use exercises for a student to work on. Extensive evaluation using more than 600 Python exercises demonstrates the effectiveness of ExGen in generating customized, ready-to-use programming exercises for new computing students.
... What makes our result particularly interesting is that this language model is relatively small, which makes running it on custom resources cheaper. In other domains, even smaller language models are reaching impressive performance, and research into how to make open language models reach better performance for educational purposes is on the rise [16][17][18]. New methods such as quantization or CPU acceleration allow such models to run on modest consumer laptops such as the ones of educators. On top of this, LLM deployment is becoming less of a barrier due to open-source hosting services such as HuggingFace [41]. ...
... Research has been performed on how large language models might be used to generate more useful error messages that are easier understood by novice programmers so they can fix their code as they are learning [16]. Others have looked at ways to use large language models to automatically provide formative feedback to students when they submit code rather than solely relying on correctness checking via unit tests [14]. Chen et al. [7] took it one step further and embedded a programming tutor capable of providing code explanation directly in Visual Studio Code in the form of a ChatGPT-powered extension. ...
Conference Paper
Full-text available
We introduce the Explorotron Visual Studio Code extension for guided and independent code exploration and learning. Explorotron is a continuation of earlier work carried out to explore how we can enable small organisations with limited resources to provide pedagogically sound learning experiences in programming. We situate Explorotron in the field of Computing Education Research (CER) and envision it to initiate a discussion around different topics, including how to balance the optimisation between the researcher-student-teacher trifecta that is inherent in CER, how to ethically and responsibly use large language models (LLMs) in the independent learning and exploration by students, and how to define better learning sessions over coding content that students obtained on their own. We further reflect on the question raised by Begel and Ko whether technology should structure learning for learners or whether learners should be taught how to structure their own independent learning outside of the classroom.
... Recent literature has explored the influence of AI assistants on student learning, such as their use in solving and generating CS problems, and in providing feedback to students [13], [14]. ...
Preprint
The use of AI assistants, along with the challenges they present, has sparked significant debate within the community of computer science education. While these tools demonstrate the potential to support students' learning and instructors' teaching, they also raise concerns about enabling unethical uses by students. Previous research has suggested various strategies aimed at addressing these issues. However, they concentrate on the introductory programming courses and focus on one specific type of problem. The present research evaluated the performance of ChatGPT, a state-of-the-art AI assistant, at solving 187 problems spanning three distinct types that were collected from six undergraduate computer science. The selected courses covered different topics and targeted different program levels. We then explored methods to modify these problems to adapt them to ChatGPT's capabilities to reduce potential misuse by students. Finally, we conducted semi-structured interviews with 11 computer science instructors. The aim was to gather their opinions on our problem modification methods, understand their perspectives on the impact of AI assistants on computer science education, and learn their strategies for adapting their courses to leverage these AI capabilities for educational improvement. The results revealed issues ranging from academic fairness to long-term impact on students' mental models. From our results, we derived design implications and recommended tools to help instructors design and create future course material that could more effectively adapt to AI assistants' capabilities.
Conference Paper
Full-text available
Good explanations are essential to efficiently learning introductory programming concepts. To provide high-quality explanations at scale, numerous systems automate the process by tracing the execution of code, defining terms, giving hints, and providing error-specific feedback. However, these approaches often require manual effort to configure and only explain a single aspect of a given code segment. Large language models (LLMs) are also changing how students interact with code. For example, Github's Copilot can generate code for programmers, leading researchers to raise concerns about cheating. Instead, our work focuses on LLMs’ potential to support learning by explaining numerous aspects of a given code snippet. This poster features a systematic analysis of the diverse natural language explanations that GPT-3 can generate automatically for a given code snippet. We present a subset of three use cases from our evolving design space of AI Explanations of Code.
Conference Paper
Full-text available
In large undergraduate computer science classrooms, student learning on assignments is often gauged only by the work on their final solution, not by their programming process. As a consequence, teachers are unable to give detailed feedback on how students implement programming methodology, and novice students often lack a metacognitive understanding of how they learn. We introduce Pensieve as a drag-and-drop, open-source tool that organizes snapshots of student code as they progress through an assignment. The tool is designed to encourage sit-down conversations between student and teacher about the programming process. The easy visualization of code evolution over time facilitates the discussion of intermediate work and progress towards learning goals, both of which would otherwise be unapparent from a single final submission. This paper discusses the pedagogical foundations and technical details of Pensieve and describes results from a particular 207-student classroom deployment, suggesting that the tool has meaningful impacts on education for both the student and the teacher.
Article
Full-text available
To provide personalized help to students who are working on code-writing problems, we introduce a data-driven tutoring system, ITAP (Intelligent Teaching Assistant for Programming). ITAP uses state abstraction, path construction, and state reification to automatically generate personalized hints for students, even when given states that have not occurred in the data before. We provide a detailed description of the system’s implementation and perform a technical evaluation on a small set of data to determine the effectiveness of the component algorithms and ITAP’s potential for self-improvement. The results show that ITAP is capable of producing hints for almost any given state after being given only a single reference solution, and that it can improve its performance by collecting data over time.
Article
Full-text available
Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.
Article
Theories on learning show that formative feedback that is immediate, specific, corrective, and positive is essential to improve novice students’ motivation and learning. However, most prior work on programming feedback focuses on highlighting student's mistakes, or detecting failed test cases after they submit a solution. In this article, we present our adaptive immediate feedback (AIF) system, which uses a hybrid data-driven feedback generation algorithm to provide students with information on their progress, code correctness, and potential errors, as well as encouragement in the middle of programming. We also present an empirical controlled study using the AIF system across several programming tasks in a CS0 classroom. Our results show that the AIF system improved students’ performance, and the proportion of students who fully completed the programming assignments, indicating increased persistence. Our results suggest that the AIF system has potential to scalably support students by giving them real-time formative feedback and the encouragement they need to complete assignments.
Article
Feedback is one of the most powerful influences on learning and achievement, but this impact can be either positive or negative. Its power is frequently mentioned in articles about learning and teaching, but surprisingly few recent studies have systematically investigated its meaning. This article provides a conceptual analysis of feedback and reviews the evidence related to its impact on learning and achievement. This evidence shows that although feedback is among the major influences, the type of feedback and the way it is given can be differentially effective. A model of feedback is then proposed that identifies the particular properties and circumstances that make it effective, and some typically thorny issues are discussed, including the timing of feedback and the effects of positive and negative feedback. Finally, this analysis is used to suggest ways in which feedback can be used to enhance its effectiveness in classrooms.
John Hattie and Helen Timperley. 2007. The power of feedback
  • John Hattie
  • Helen Timperley
  • Hattie John
  • Tom B Brown
  • Benjamin Mann
  • Nick Ryder
  • Melanie Subbiah
  • Jared Kaplan
  • Prafulla Dhariwal
  • Arvind Neelakantan
  • Pranav Shyam
  • Girish Sastry
  • Amanda Askell
  • Sandhini Agarwal
  • Ariel Herbert-Voss
  • Gretchen Krueger
  • Tom Henighan
  • Rewon Child
  • Aditya Ramesh
  • Daniel M Ziegler
  • Jeffrey Wu
  • Clemens Winter
  • Christopher Hesse
  • Mark Chen
  • Eric Sigler
  • Mateusz Litwin
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020). arXiv:2005.14165 https://arxiv.org/abs/2005.14165
  • Zhangyin Feng
  • Daya Guo
  • Duyu Tang
  • Nan Duan
  • Xiaocheng Feng
  • Ming Gong
  • Linjun Shou
  • Bing Qin
  • Ting Liu
  • Daxin Jiang
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
Automatic generation of programming feedback: A data-driven approach
  • Kelly Rivers
  • Kenneth R Koedinger
Kelly Rivers and Kenneth R Koedinger. 2013. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), Vol. 50.
ProtoTransformer: A meta-learning approach to providing student feedback
  • Mike Wu
  • Noah Goodman
  • Chris Piech
  • Chelsea Finn
Mike Wu, Noah Goodman, Chris Piech, and Chelsea Finn. 2021. ProtoTransformer: A meta-learning approach to providing student feedback. arXiv preprint arXiv:2107.14035 (2021).
Adaptive Immediate Feedback for Block-Based Programming: Design and Evaluation
  • Bita Samiha Marwan
  • Tiffany Akram
  • Thomas W Barnes
  • Price
  • Marwan Samiha