Juho LeinonenAalto University · Department of Computer Science
Juho Leinonen
PhD
About
146
Publications
17,409
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,099
Citations
Introduction
Additional affiliations
January 2023 - December 2023
March 2022 - December 2022
September 2021 - March 2022
Education
February 2017 - December 2019
Publications
Publications (146)
Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Many were fast to point out the opportunities and challenges of GenAI. Researchers reported that these new tools are capable of solving most introductor...
Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Many were fast to point out the opportunities and challenges of GenAI. Researchers reported that these new tools are capable of solving most introductor...
Non-native English speakers (NNES) face multiple barriers to learning programming. These barriers can be obvious, such as the fact that programming language syntax and instruction are often in English, or more subtle, such as being afraid to ask for help in a classroom full of native English speakers. However, these barriers are frustrating because...
Motivation: Students learning to program often reach states where they are stuck and can make no forward progress. An automatically generated next-step hint can help them make forward progress and support their learning. It is important to know what makes a good hint or a bad hint, and how to generate good hints automatically in novice programming...
Computing educators and researchers have used programming process data to understand how programs are constructed and what sorts of problems students struggle with. Although such data shows promise for using it for feedback, fully automated programming process feedback systems have still been an under-explored area. The recent emergence of large la...
There is a great need for data in computing education research. Data is needed to understand how students behave, to train models of student behavior to optimally support students, and to develop and validate new assessment tools and learning analytics techniques. However, relatively few computing education datasets are shared openly, often due to...
Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to priv...
We warmly invite you to attend the 24th Koli Calling International Conference on Computing Education Research (Koli Calling 2024), to be held 14-17 November 2024 in the beautiful Koli National Forest in Eastern Finland. While the submission deadline for full papers and discussion papers has passed, you can still submit posters and demo papers by 20...
Introductory programming courses often emphasize mastering syntax and basic constructs before progressing to more complex and interesting programs. This bottom-up approach can be frustrating for novices, shifting the focus away from problem solving and potentially making computing less appealing to a broad range of students. The rise of generative...
Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain...
Introducing students to new concepts in computer science can often be challenging, as these concepts may differ significantly from their existing knowledge and conceptual understanding. To address this, we employed analogies to help students connect new concepts to familiar ideas. Specifically, we generated analogies using large language models (LL...
Generative AI (GenAI) has seen great advancements in the past two years and the conversation around adoption is increasing. Widely available GenAI tools are disrupting classroom practices as they can write and explain code with minimal student prompting. While most acknowledge that there is no way to stop students from using such tools, a consensus...
The emergence of large language models (LLMs) has transformed research and practice in a wide range of domains. Within the computing education research (CER) domain, LLMs have received plenty of attention especially in the context of learning programming. Much of the work on LLMs in CER has however focused on applying and evaluating proprietary mod...
Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students' interests and cultural backgrounds. Prior research in educational psychology has demonstrated that context personalization of exercises stimulates learners' situational interests and positively affec...
In the present study, we provided students an unfiltered access to a state-of-the-art large language model (LLM) chatbot. The chatbot was intentionally designed to mimic proprietary commercial chatbots such as ChatGPT where the chatbot has not been tailored for the educational context; the underlying engine was OpenAI GPT-4. The chatbot was integra...
We warmly invite you to attend the 24th Koli Calling International Conference on Computing Education Research (Koli Calling 2024), to be held 14-17 November 2024 in the beautiful Koli National Forest in Eastern Finland. The submission deadline for full papers and discussion papers is 21 June 2024.
Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are...
In a previous Birds of a Feather discussion, we delved into the nascent applications of generative AI, contemplating its potential and speculating on future trajectories. Since then, the landscape has continued to evolve revealing the capabilities and limitations of these models. Despite this progress, the computing education research community sti...
Grasping complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between unfamiliar concepts and familiar ones, providing an engaging way to aid understanding. However, creating effective educational analog...
The emergence of publicly accessible large language models (LLMs) such as ChatGPT poses unprecedented risks of new types of plagiarism and cheating where students use LLMs to solve exercises for them. Detecting this behavior will be a necessary component in introductory computer science (CS1) courses, and educators should be well-equipped with dete...
The computing education community has a rich history of pedagogical innovation with many efforts, especially at the introductory level, focused on helping students learn how to program. Recent advances in artificial intelligence have led to large language models that can produce source code from natural language problem descriptions with impressive...
Computing education plays a significant role in the calibre of computing professionals; hence, improving its quality is a valuable endeavour. A promising means for such an endeavour is the harnessing of student data from version control systems. This has previously been used to predict academic performance, but a gap lies in its usage for learning...
Recent advancements in artificial intelligence (AI) and specifically generative AI (GenAI) are threatening to fundamentally reshape computing and society. Largely driven by large language models (LLMs), many tools are now able to interpret and generate both natural language instructions and source code. These capabilities have sparked urgent questi...
Identifying and resolving logic errors can be one of the most frustrating challenges for novice programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior---in other cases, the issue might be about how a problem statement...
Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code can now use free tools to automatically suggest solutions to programming exercises and assignm...
https://arxiv.org/abs/2307.16364
With their remarkable ability to generate code, large language models (LLMs) are a transformative technology for computing education practice. They have created an urgent need for educators to rethink pedagogical approaches and teaching strategies for newly emerging skill sets. Traditional approaches to learning pr...
The emergence of ChatGPT has raised concerns about students potentially using it for cheating. Computer Science (CS) educators are becoming worried because of the potential short and long-term adverse effects it might have on students. However, it is unclear to what extent ChatGPT-generated code can be distinguished from student-written code in int...
The recent advent of highly accurate and scalable large language models (LLMs) has taken the world by storm. From art to essays to computer code, LLMs are producing novel content that until recently was thought only humans could produce. Recent work in computing education has sought to understand the capabilities of LLMs for solving tasks such as w...
In educational settings, automated program repair techniques serve as a feedback mechanism to guide students working on their programming assignments. Recent work has investigated using large language models (LLMs) for program repair. In this area, most of the attention has been focused on using proprietary systems accessible through APIs. However,...
As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors...
Background and Context: Over the past year, large language models (LLMs) have taken the world by storm. In computing education, like in other walks of life, many opportunities and threats have emerged as a consequence. Objectives: In this article, we explore such opportunities and threats in a specific area: responding to student programmers' help...
https://arxiv.org/abs/2306.02608
The computing education community has a rich history of pedagogical innovation designed to support students in introductory courses, and to support teachers in facilitating student learning. Very recent advances in artificial intelligence have resulted in code generation models that can produce source code from nat...
Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction...
Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code can now use free tools to automatically suggest solutions to programming exercises and assignm...
Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advanc...
The introduction of Large Language Models (LLMs) has generated a significant amount of excitement both in industry and among researchers. Recently, tools that leverage LLMs have made their way into the classroom where they help students generate code and help instructors generate learning materials. There are likely many more uses of these tools --...
Advances in natural language processing have resulted in large language models (LLMs) that can generate code and code explanations. In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. Three different types of explanations -- a...
Learnersourcing is a common task in modern computing classrooms, where it is used, for example, for the creation of educational resources such as multiple-choice questions and programming exercises. One less studied type of learnersourced artefact is SQL exercises. In this work, we explore how well different SQL topics are covered by learnersourced...
Crowdsourcing is a general term that describes the practice of many individuals working collectively to achieve a common goal or complete a task, often involving the generation of content. In an educational context, crowdsourcing of learning materials – where students create resources that can be used by other learners – offers several benefits. St...
Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advanc...
In this article, we introduce and evaluate the concept of robosourcing for creating educational content. Robosourcing lies in the intersection of crowdsourcing and large language models, where instead of a crowd of humans, requests to large language models replace some of the work traditionally performed by the crowd. Robosourcing includes a human-...
Advances in natural language processing have resulted in large language models (LLMs) that are capable of generating understandable and sensible written text. Recent versions of these models, such as OpenAI Codex and GPT-3, can generate code and code explanations. However, it is unclear whether and how students might engage with such explanations....
A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then us...
New knowledge tracing models are continuously being proposed, even at a pace where state-of-theart
models cannot be compared with each other at the time of publication. This leads to a situation
where ranking models is hard, and the underlying reasons of the models’ performance – be it architectural
choices, hyperparameter tuning, performance metri...