Figure - available from: Empirical Software Engineering
This content is subject to copyright. Terms and conditions apply.
Structure of results analysis. Data was analyzed on three different levels, beginning with individual prompt purposes to intentions of conversations

Structure of results analysis. Data was analyzed on three different levels, beginning with individual prompt purposes to intentions of conversations

Source publication
Article
Full-text available
Context Chatbots based on large language models are becoming an important tool in modern software development, yet little is known about how programming beginners interact with this new technology to write code and acquire new knowledge. Thus, we are missing key ingredients to develop guidelines on how to adopt chatbots for becoming productive at p...

Similar publications

Preprint
Full-text available
We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It supports both Chinese and English, engages in real-time voice conversations, and varies vocal nuances such as emotion, intonation, speech rate, and dialect according to user instructions. GLM-4-Voice uses an ultra-low bitrate (175bps), single-codebook speech token...
Article
Full-text available
Introduction Artificial Intelligences (AIs) are changing the way information is accessed and consumed globally. This study aims to evaluate the information quality provided by AIs ChatGPT4 and Claude2 concerning reconstructive surgery for head and neck cancer. Methods Thirty questions on reconstructive surgery for head and neck cancer were directe...
Article
Full-text available
ABSTRAK Perkembangan teknologi digital, khususnya chatbot, telah mengubah cara interaksi manusia dengan layanan digital. Chatbot, sebagai sistem berbasis kecerdasan buatan, menawarkan kenyamanan dalam berkomunikasi, namun juga menimbulkan tantangan terkait keamanan dan privasi data pengguna. Penelitian ini bertujuan untuk mengeksplorasi persepsi pe...
Conference Paper
Full-text available
This article aims to explore the potential of using Artificial Intelligence (AI) to support analysis methodologies. The conversational process with data can reveal conceptual themes that might elude human interpretation. In the complex contexts of the post-digital era it is timely to reconceptualize educational research as a hybrid system of intera...
Conference Paper
Full-text available
The 21st century has ushered in a remarkable era of Artificial Intelligence (AI), presenting a fertile ground for educators to explore innovative approaches within the realm of education. This study aimed to discuss students' perceptions of the use of ChatGPT in teaching writing in EFL classrooms. In total 43 university students took part in a ques...

Citations

... Thus the initial research in the area suggests that novices struggle to directly use LLMs in an effective manner. This chimes with other research considering the pedagogical implications: Xue et al. [82] and Kazemitabaar et al. [32] found that direct use of LLMs did not produce any significant effect on learning (although the latter suggest that students with higher prior knowledge may have received greater benefits from using the generator than students with less prior knowledge), while Mailach et al. [49] concluded that "we cannot just give vanilla [LLM] chatbots to students as tools to learn programming, but we additionally need to give proper guidance on how to use them-otherwise, students tend to use it mainly for code generation without further reflection on or evaluation of generated code. " 2.3.2 ...
Preprint
Full-text available
Motivation: Students learning to program often reach states where they are stuck and can make no forward progress. An automatically generated next-step hint can help them make forward progress and support their learning. It is important to know what makes a good hint or a bad hint, and how to generate good hints automatically in novice programming tools, for example using Large Language Models (LLMs). Method and participants: We recruited 44 Java educators from around the world to participate in an online study. We used a set of real student code states as hint-generation scenarios. Participants used a technique known as comparative judgement to rank a set of candidate next-step Java hints, which were generated by Large Language Models (LLMs) and by five human experienced educators. Participants ranked the hints without being told how they were generated. Findings: We found that LLMs had considerable variation in generating high quality next-step hints for programming novices, with GPT-4 outperforming other models tested. When used with a well-designed prompt, GPT-4 outperformed human experts in generating pedagogically valuable hints. A multi-stage prompt was the most effective LLM prompt. We found that the two most important factors of a good hint were length (80--160 words being best), and reading level (US grade 9 or below being best). Offering alternative approaches to solving the problem was considered bad, and we found no effect of sentiment. Conclusions: Automatic generation of these hints is immediately viable, given that LLMs outperformed humans -- even when the students' task is unknown. The fact that only the best prompts achieve this outcome suggests that students on their own are unlikely to be able to produce the same benefit. The prompting task, therefore, should be embedded in an expert-designed tool.