Juho Leinonen

Juho Leinonen
Aalto University · Department of Computer Science

PhD

About

146
Publications
17,409
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,099
Citations
Additional affiliations
January 2023 - December 2023
University of Auckland
Position
  • Postdoctoral Researcher
March 2022 - December 2022
Aalto University
Position
  • Postdoctoral Researcher
September 2021 - March 2022
University of Helsinki
Position
  • PostDoc Position
Description
  • Research on digital learning tools.
Education
February 2017 - December 2019
University of Helsinki
Field of study
  • Computer Science

Publications

Publications (146)
Preprint
Full-text available
Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Many were fast to point out the opportunities and challenges of GenAI. Researchers reported that these new tools are capable of solving most introductor...
Preprint
Full-text available
Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Many were fast to point out the opportunities and challenges of GenAI. Researchers reported that these new tools are capable of solving most introductor...
Preprint
Full-text available
Non-native English speakers (NNES) face multiple barriers to learning programming. These barriers can be obvious, such as the fact that programming language syntax and instruction are often in English, or more subtle, such as being afraid to ask for help in a classroom full of native English speakers. However, these barriers are frustrating because...
Preprint
Full-text available
Motivation: Students learning to program often reach states where they are stuck and can make no forward progress. An automatically generated next-step hint can help them make forward progress and support their learning. It is important to know what makes a good hint or a bad hint, and how to generate good hints automatically in novice programming...
Preprint
Full-text available
Computing educators and researchers have used programming process data to understand how programs are constructed and what sorts of problems students struggle with. Although such data shows promise for using it for feedback, fully automated programming process feedback systems have still been an under-explored area. The recent emergence of large la...
Preprint
Full-text available
There is a great need for data in computing education research. Data is needed to understand how students behave, to train models of student behavior to optimally support students, and to develop and validate new assessment tools and learning analytics techniques. However, relatively few computing education datasets are shared openly, often due to...
Preprint
Full-text available
Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to priv...
Article
We warmly invite you to attend the 24th Koli Calling International Conference on Computing Education Research (Koli Calling 2024), to be held 14-17 November 2024 in the beautiful Koli National Forest in Eastern Finland. While the submission deadline for full papers and discussion papers has passed, you can still submit posters and demo papers by 20...
Preprint
Full-text available
Introductory programming courses often emphasize mastering syntax and basic constructs before progressing to more complex and interesting programs. This bottom-up approach can be frustrating for novices, shifting the focus away from problem solving and potentially making computing less appealing to a broad range of students. The rise of generative...
Preprint
Full-text available
Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain...
Poster
Full-text available
Introducing students to new concepts in computer science can often be challenging, as these concepts may differ significantly from their existing knowledge and conceptual understanding. To address this, we employed analogies to help students connect new concepts to familiar ideas. Specifically, we generated analogies using large language models (LL...
Conference Paper
Generative AI (GenAI) has seen great advancements in the past two years and the conversation around adoption is increasing. Widely available GenAI tools are disrupting classroom practices as they can write and explain code with minimal student prompting. While most acknowledge that there is no way to stop students from using such tools, a consensus...
Preprint
Full-text available
The emergence of large language models (LLMs) has transformed research and practice in a wide range of domains. Within the computing education research (CER) domain, LLMs have received plenty of attention especially in the context of learning programming. Much of the work on LLMs in CER has however focused on applying and evaluating proprietary mod...
Preprint
Full-text available
Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students' interests and cultural backgrounds. Prior research in educational psychology has demonstrated that context personalization of exercises stimulates learners' situational interests and positively affec...
Preprint
Full-text available
In the present study, we provided students an unfiltered access to a state-of-the-art large language model (LLM) chatbot. The chatbot was intentionally designed to mimic proprietary commercial chatbots such as ChatGPT where the chatbot has not been tailored for the educational context; the underlying engine was OpenAI GPT-4. The chatbot was integra...
Article
We warmly invite you to attend the 24th Koli Calling International Conference on Computing Education Research (Koli Calling 2024), to be held 14-17 November 2024 in the beautiful Koli National Forest in Eastern Finland. The submission deadline for full papers and discussion papers is 21 June 2024.
Preprint
Full-text available
Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are...
Conference Paper
In a previous Birds of a Feather discussion, we delved into the nascent applications of generative AI, contemplating its potential and speculating on future trajectories. Since then, the landscape has continued to evolve revealing the capabilities and limitations of these models. Despite this progress, the computing education research community sti...
Preprint
Full-text available
Grasping complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between unfamiliar concepts and familiar ones, providing an engaging way to aid understanding. However, creating effective educational analog...
Conference Paper
Full-text available
The emergence of publicly accessible large language models (LLMs) such as ChatGPT poses unprecedented risks of new types of plagiarism and cheating where students use LLMs to solve exercises for them. Detecting this behavior will be a necessary component in introductory computer science (CS1) courses, and educators should be well-equipped with dete...
Article
The computing education community has a rich history of pedagogical innovation with many efforts, especially at the introductory level, focused on helping students learn how to program. Recent advances in artificial intelligence have led to large language models that can produce source code from natural language problem descriptions with impressive...
Article
Full-text available
Computing education plays a significant role in the calibre of computing professionals; hence, improving its quality is a valuable endeavour. A promising means for such an endeavour is the harnessing of student data from version control systems. This has previously been used to predict academic performance, but a gap lies in its usage for learning...
Conference Paper
Recent advancements in artificial intelligence (AI) and specifically generative AI (GenAI) are threatening to fundamentally reshape computing and society. Largely driven by large language models (LLMs), many tools are now able to interpret and generate both natural language instructions and source code. These capabilities have sparked urgent questi...
Conference Paper
Full-text available
Identifying and resolving logic errors can be one of the most frustrating challenges for novice programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior---in other cases, the issue might be about how a problem statement...
Article
Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code can now use free tools to automatically suggest solutions to programming exercises and assignm...
Preprint
https://arxiv.org/abs/2307.16364 With their remarkable ability to generate code, large language models (LLMs) are a transformative technology for computing education practice. They have created an urgent need for educators to rethink pedagogical approaches and teaching strategies for newly emerging skill sets. Traditional approaches to learning pr...
Conference Paper
Full-text available
The emergence of ChatGPT has raised concerns about students potentially using it for cheating. Computer Science (CS) educators are becoming worried because of the potential short and long-term adverse effects it might have on students. However, it is unclear to what extent ChatGPT-generated code can be distinguished from student-written code in int...
Conference Paper
The recent advent of highly accurate and scalable large language models (LLMs) has taken the world by storm. From art to essays to computer code, LLMs are producing novel content that until recently was thought only humans could produce. Recent work in computing education has sought to understand the capabilities of LLMs for solving tasks such as w...
Chapter
In educational settings, automated program repair techniques serve as a feedback mechanism to guide students working on their programming assignments. Recent work has investigated using large language models (LLMs) for program repair. In this area, most of the attention has been focused on using proprietary systems accessible through APIs. However,...
Preprint
Full-text available
As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors...
Preprint
Full-text available
Background and Context: Over the past year, large language models (LLMs) have taken the world by storm. In computing education, like in other walks of life, many opportunities and threats have emerged as a consequence. Objectives: In this article, we explore such opportunities and threats in a specific area: responding to student programmers' help...
Preprint
https://arxiv.org/abs/2306.02608 The computing education community has a rich history of pedagogical innovation designed to support students in introductory courses, and to support teachers in facilitating student learning. Very recent advances in artificial intelligence have resulted in code generation models that can produce source code from nat...
Preprint
Full-text available
Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction...
Preprint
Full-text available
Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code can now use free tools to automatically suggest solutions to programming exercises and assignm...
Conference Paper
Full-text available
Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advanc...
Conference Paper
Full-text available
The introduction of Large Language Models (LLMs) has generated a significant amount of excitement both in industry and among researchers. Recently, tools that leverage LLMs have made their way into the classroom where they help students generate code and help instructors generate learning materials. There are likely many more uses of these tools --...
Conference Paper
Full-text available
Advances in natural language processing have resulted in large language models (LLMs) that can generate code and code explanations. In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. Three different types of explanations -- a...
Preprint
Full-text available
Learnersourcing is a common task in modern computing classrooms, where it is used, for example, for the creation of educational resources such as multiple-choice questions and programming exercises. One less studied type of learnersourced artefact is SQL exercises. In this work, we explore how well different SQL topics are covered by learnersourced...
Article
Full-text available
Crowdsourcing is a general term that describes the practice of many individuals working collectively to achieve a common goal or complete a task, often involving the generation of content. In an educational context, crowdsourcing of learning materials – where students create resources that can be used by other learners – offers several benefits. St...
Preprint
Full-text available
Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advanc...
Preprint
Full-text available
In this article, we introduce and evaluate the concept of robosourcing for creating educational content. Robosourcing lies in the intersection of crowdsourcing and large language models, where instead of a crowd of humans, requests to large language models replace some of the work traditionally performed by the crowd. Robosourcing includes a human-...
Preprint
Full-text available
Advances in natural language processing have resulted in large language models (LLMs) that are capable of generating understandable and sensible written text. Recent versions of these models, such as OpenAI Codex and GPT-3, can generate code and code explanations. However, it is unclear whether and how students might engage with such explanations....
Preprint
Full-text available
A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then us...
Article
New knowledge tracing models are continuously being proposed, even at a pace where state-of-theart models cannot be compared with each other at the time of publication. This leads to a situation where ranking models is hard, and the underlying reasons of the models’ performance – be it architectural choices, hyperparameter tuning, performance metri...