Preprint

Can Developers Prompt? A Controlled Experiment for Code Documentation Generation

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Large language models (LLMs) bear great potential for automating tedious development tasks such as creating and maintaining code documentation. However, it is unclear to what extent developers can effectively prompt LLMs to create concise and useful documentation. We report on a controlled experiment with 20 professionals and 30 computer science students tasked with code documentation generation for two Python functions. The experimental group freely entered ad-hoc prompts in a ChatGPT-like extension of Visual Studio Code, while the control group executed a predefined few-shot prompt. Our results reveal that professionals and students were unaware of or unable to apply prompt engineering techniques. Especially students perceived the documentation produced from ad-hoc prompts as significantly less readable, less concise, and less helpful than documentation from prepared prompts. Some professionals produced higher quality documentation by just including the keyword Docstring in their ad-hoc prompts. While students desired more support in formulating prompts, professionals appreciated the flexibility of ad-hoc prompting. Participants in both groups rarely assessed the output as perfect. Instead, they understood the tools as support to iteratively refine the documentation. Further research is needed to understand which prompting skills and preferences developers have and which support they need for certain tasks.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
With the advances in machine learning, there is a growing interest in AI-enabled tools for autocompleting source code. GitHub Copilot, also referred to as the “AI Pair Programmer”, has been trained on billions of lines of open source GitHub code, and is one of such tools that has been increasingly used since its launch in June 2021. However, little effort has been devoted to understanding the practices, challenges, and expected features of using Copilot in programming for auto-completed source code from the point of view of practitioners. To this end, we conducted an empirical study by collecting and analyzing the data from Stack Overflow (SO) and GitHub Discussions. More specifically, we searched and manually collected 303 SO posts and 927 GitHub discussions related to the usage of Copilot. We identified the programming languages, Integrated Development Environments (IDEs), technologies used with Copilot, functions implemented, benefits, limitations, and challenges when using Copilot. The results show that when practitioners use Copilot: (1) The major programming languages used with Copilot are JavaScript and Python, (2) the main IDE used with Copilot is Visual Studio Code, (3) the most common used technology with Copilot is Node.js, (4) the leading function implemented by Copilot is data processing, (5) the main purpose of users using Copilot is to help generate code, (6) the significant benefit of using Copilot is useful code generation, (7) the main limitation encountered by practitioners when using Copilot is difficulty of integration, and (8) the most common expected feature is that Copilot can be integrated with more IDEs. Our results suggest that using Copilot is like a double-edged sword, which requires developers to carefully consider various aspects when deciding whether or not to use it. Our study provides empirically grounded foundations that could inform software developers and practitioners, as well as provide a basis for future investigations on the role of Copilot as an AI pair programmer in software development.
Article
Full-text available
Artificial intelligence-powered generative language models (GLMs), such as ChatGPT, Perplexity AI, and Google Bard, have the potential to provide personalized learning, unlimited practice opportunities, and interactive engagement 24/7, with immediate feedback. However, to fully utilize GLMs, properly formulated instructions are essential. Prompt engineering is a systematic approach to effectively communicating with GLMs to achieve the desired results. Well-crafted prompts yield good responses from the GLM, while poorly constructed prompts will lead to unsatisfactory responses. Besides the challenges of prompt engineering, significant concerns are associated with using GLMs in medical education, including ensuring accuracy, mitigating bias, maintaining privacy, and avoiding excessive reliance on technology. Future directions involve developing more sophisticated prompt engineering techniques, integrating GLMs with other technologies, creating personalized learning pathways, and researching the effectiveness of GLMs in medical education.
Article
Full-text available
Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment aims to describe the functionality of the entire method, thereby providing a general comment at the beginning of the method. The snippet comment appears at multiple code segments in the body of a method, where a code segment is called a code snippet. Both of them help developers quickly understand code semantics, thereby improving code readability and code maintainability. However, existing automatic comment generation models mainly focus more on header comments because there are public datasets to validate the performance. By contrast, it is challenging to collect datasets for snippet comments because it is difficult to determine their scope. Even worse, code snippets are often too short to capture complete syntax and semantic information. To address this challenge, we propose a novel S nippet C omment Gen eration approach called SCGen . First, we utilize the context of the code snippet to expand the syntax and semantic information. Specifically, 600,243 snippet code-comment pairs are collected from 959 Java projects. Then, we capture variables from code snippets and extract variable-related statements from the context. After that, we devise an algorithm to parse and traverse abstract syntax tree (AST) information of code snippets and corresponding context. Finally, SCGen generates snippet comments after inputting the source code snippet and corresponding AST information into a sequence-to-sequence-based model. We conducted extensive experiments on the dataset we collected to evaluate our SCGen . Our approach obtains 18.23 in BLEU-4 metrics, 18.83 in METEOR, and 23.65 in ROUGE-L, which outperforms state-of-the-art comment generation models.
Article
Full-text available
Generative artificial intelligence (AI) tools, such as Bard, ChatGPT, and CoPilot, have rapidly gained widespread usage. They also have the potential to boost software engineering productivity. In this article, we elaborate technologies and usage of generative AI in the software industry. We address questions, such as: How does generative AI improve software productivity? How to connect generative AI to software development, and what are the risks? Which technologies have what sorts of benefits? Practitioner guidance and case studies are shared from our industry context. I look forward to hearing from you about this column and the technologies that matter most for your work.—Christof Ebert
Article
Full-text available
Code summarization is a task that is often employed by software developers for fixing code or reusing code. Software documentation is essential when it comes to software maintenance. The highest cost in software development goes to maintenance because of the difficulty of code modification. To help in reducing the cost and time spent on software development and maintenance, we introduce an automated comment summarization and commenting technique using state-of-the-art techniques in summarization. We use deep neural networks, specifically bidirectional long short-term memory (Bi-LSTM), combined with an attention model to enhance performance. In this study, we propose two different scenarios: one that uses the code text and the structure of the code represented in an abstract syntax tree (AST) and another that uses only code text. We propose two encoder-based models for the first scenario that encodes the code text and the AST independently. Previous works have used different techniques in deep neural networks to generate comments. This study’s proposed methodologies scored higher than previous works based on the gated recurrent unit encoder. We conducted our experiment on a dataset of 2.1 million pairs of Java methods and comments. Additionally, we showed that the code structure is beneficial for methods’ signatures featuring unclear words.
Conference Paper
Full-text available
Software developers spend a substantial amount of time reading and understanding code. Research has shown that code comprehension tasks can be expedited by reading the available documentation. However, documentation is expensive to generate and maintain, so the available documentation is often missing or outdated. Thus, automated generation of brief natural language descriptions for source code is desirable and has the potential to play a key role in source code comprehension and development. In particular, recent advances in deep learning have led to sophisticated summary generation techniques. Nevertheless, to the best of our knowledge, no study has fully integrated a state-of-the-art code summarization technique into an integrated development environment (IDE). In hopes of filling this gap, we developed a VS Code extension that allows developers to take advantage of state-of-the-art code summarization from within the IDE. This paper describes Divinator, our IDE-integrated tool for source code summarization.
Article
Full-text available
Code comments are important artifacts in software systems and play a paramount role in many software engineering (SE) tasks related to maintenance and program comprehension. However, while it is widely accepted that high quality matters in code comments just as it matters in source code, assessing comment quality in practice is still an open problem. First and foremost, there is no unique definition of quality when it comes to evaluating code comments. The few existing studies on this topic rather focus on specific attributes of quality that can be easily quantified and measured. Existing techniques and corresponding tools may also focus on comments bound to a specific programming language, and may only deal with comments with specific scopes and clear goals (e.g., Javadoc comments at the method level, or in-body comments describing TODOs to be addressed). In this paper, we present a Systematic Literature Review (SLR) of the last decade of research in SE to answer the following research questions: (i) What types of comments do researchers focus on when assessing comment quality? (ii) What quality attributes (QAs) do they consider? (iii) Which tools and techniques do they use to assess comment quality?, and (iv) How do they evaluate their studies on comment quality assessment in general? Our evaluation, based on the analysis of 2353 papers and the actual review of 47 relevant ones, shows that (i) most studies and techniques focus on comments in Java code, thus may not be generalizable to other languages, and (ii) the analyzed studies focus on four main QAs of a total of 21 QAs identified in the literature, with a clear predominance of checking consistency between comments and the code. We observe that researchers rely on manual assessment and specific heuristics rather than the automated assessment of the comment quality attributes, with evaluations often involving surveys of students and the authors of the original studies but rarely professional developers.
Article
Full-text available
Automatic code documentation generation has been a crucial task in the field of software engineering. It not only relieves developers from writing code documentation but also helps them to understand programs better. Specifically, deep-learning-based techniques that leverage large-scale source code corpora have been widely used in code documentation generation. These works tend to use automatic metrics (such as BLEU, METEOR, ROUGE, CIDEr, and SPICE) to evaluate different models. These metrics compare generated documentation to reference texts by measuring the overlapping words. Unfortunately, there is no evidence demonstrating the correlation between these metrics and human judgment. We conduct experiments on two popular code documentation generation tasks, code comment generation and commit message generation, to investigate presence or absence of correlations between these metrics and human judgements. For each task, we replicate three state-of-the-art approaches and the generated documentation is evaluated automatically in terms of BLEU, METEOR, ROUGE-L, CIDEr, and SPICE. We also ask 24 participants to rate the generated documentation considering three aspects (i.e., language, content, and effectiveness). Each participant is given Java methods or commit diffs along with the target documentation to be rated. The results show that the ranking of generated documentation from automatic metrics is different from that evaluated by human annotators. Thus, these automatic metrics are not reliable enough to replace human evaluation for code documentation generation tasks. In addition, METEOR shows the strongest correlation (with moderate Pearson correlation r about 0.7) to human evaluation metrics. However, it is still much lower than the correlation observed between different annotators (with high Pearson correlation r about 0.8) and correlations that are reported in the literature for other tasks (e.g., Neural Machine Translation [39]
Article
Full-text available
Self-admitted technical debt (SATD) consists of annotations, left by developers as comments in the source code or elsewhere, as a reminder about pieces of software manifesting technical debt (TD), i.e., “not being ready yet”. While previous studies have investigated SATD management and its relationship with software quality, there is little understanding of the extent and circumstances to which developers admit TD. This paper reports the results of a study in which we asked developers from industry and open-source about their practices in annotating source code and other artifacts for self-admitting TD. The study consists of two phases. First, we conducted 10 interviews to gather a first understanding of the phenomenon and to prepare a survey questionnaire. Then, we surveyed 52 industrial developers as well as 49 contributors to open-source projects. Results of the study show how the TD annotation practices, as well as the typical content of SATD comments, are very similar between open-source and industry. At the same time, our results highlight how, while open-source code is spread of comments admitting the need for improvements, SATD in industry may be dictated by organizational guidelines but, at the same time, implicitly discouraged by the fear of admitting responsibilities. Results also highlight the need for tools helping developers to achieve a better TD awareness.
Conference Paper
Full-text available
During software maintenance, code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in the software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named DeepCom to automatically generate code comments for Java methods. The generated comments aim to help developers understand the functionality of Java methods. DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. We use a deep neural network that analyzes structural information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects from GitHub. We evaluate the experimental results on a machine translation metric. Experimental results demonstrate that our method DeepCom outperforms the state-of-the-art by a substantial margin.
Article
Full-text available
Many community-based open source software (OSS) projects depend on a continuous influx of newcomers for their survival and continuity; yet, newcomers face many barriers to contributing to a project for the first time, leading in many cases to dropouts. In this paper, we provide guidelines for both OSS communities interested in receiving more external contributions, and newcomers who want to contribute to OSS projects. These guidelines are based on our previous work, which characterized barriers encountered by newcomers and proposed tools to support them in overcoming these barriers. Since newcomers are critical for OSS growth and continuity, our work may help increase contributions to OSS projects, as well as promote a more diverse community.
Article
Full-text available
The comparison of two means is one of the most commonly applied statistical procedures in psychology. The independent samples t-test corrected for unequal variances is commonly known as Welch’s test, and is widely considered to be a robust alternative to the independent samples t-test. The properties of Welch’s test that make it Type I error robust are examined. The degrees of freedom used in Welch’s test are a random variable, the distributions of which are examined using simulation. It is shown how the distribution for the degrees of freedom is dependent on the sample sizes and the variances of the samples. The impact of sample variances on the degrees of freedom, the resultant critical value and the test statistic is considered, and hence gives an insight into why Welch’s test is Type I error robust under normality.
Article
Full-text available
The importance of normal distribution is undeniable since it is an underlying assumption of many statistical procedures such as t-tests, linear regression analysis, discriminant analysis and Analysis of Variance (ANOVA). When the normality assumption is violated, interpretation and inferences may not be reliable or valid. The three common procedures in assessing whether a random sample of independent observations of size n come from a population with a normal distribution are: graphical methods (histograms, boxplots, Q-Q-plots), numerical methods (skewness and kurtosis indices) and formal normality tests. This paper* compares the power of four formal tests of normality: Shapiro-Wilk (SW) test, Kolmogorov-Smirnov (KS) test, Lilliefors (LF) test and Anderson-Darling (AD) test. Power comparisons of these four tests were obtained via Monte Carlo simulation of sample data generated from alternative distributions that follow symmetric and asymmetric distributions. Ten thousand samples of various sample size were generated from each of the given alternative symmetric and asymmetric distributions. The power of each test was then obtained by comparing the test of normality statistics with the respective critical values. Results show that Shapiro-Wilk test is the most powerful normality test, followed by Anderson-Darling test, Lilliefors test and Kolmogorov-Smirnov test. However, the power of all four tests is still low for small sample size.
Conference Paper
Full-text available
An end-user questionnaire to measure user experience quickly in a simple and immediate way while covering a preferably comprehensive impression of the product user experience was the goal of the reported construction process. An empirical approach for the item selection was used to ensure practical relevance of items. Usability experts collected terms and statements on user experience and usability, including ‘hard’ as well as ‘soft’ aspects. These statements were consolidated and transformed into a first questionnaire version containing 80 bipolar items. It was used to measure the user experience of software products in several empirical studies. Data were subjected to a factor analysis which resulted in the construction of a 26 item questionnaire including the six factors Attractiveness, Perspicuity, Efficiency, Dependability, Stimulation, and Novelty. Studies conducted for the original German questionnaire and an English version indicate a satisfactory level of reliability and construct validity.
Article
Full-text available
It is often difficult, particularly when conducting research in psychology, to have access to large normally distributed samples. Fortunately, there are statistical tests to compare two independent groups that do not require large normally distributed samples. The Mann-Whitney U is one of these tests. In the following work, a summary of this test is presented. The explanation of the logic underlying this test and its application are presented. Moreover, the forces and weaknesses of the Mann-Whitney U are mentioned. One major limit of the Mann-Whitney U is that the type I error or alpha (?) is amplified in a situation of heteroscedasticity.
Article
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning”. Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P (y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x′ that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x^\hat{\bm {x}} , from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g. the choice of pre-trained language models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website NLPedia–Pretrain including constantly-updated survey, and paperlist.
Article
Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this paper, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., data flow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% in terms of METEOR score, a metric widely used for measuring generation quality, respectively on two benchmark datasets.
Article
Context: Coding is an incremental activity where a developer may need to understand a code before making suitable changes in the code. Code documentation is considered one of the best practices in software development but requires significant efforts from developers. Recent advances in natural language processing and machine learning have provided enough motivation to devise automated approaches for source code documentation at multiple levels. Objective: The review aims to study current code documentation practices and analyze the existing literature to provide a perspective on their preparedness to address the said problem and the challenges lie ahead. Methodology: We provide a detailed account of the literature in the area of automated source code documentation at different levels and critically analyze the effectiveness of the proposed approaches. This also allows us to infer gaps and challenges to address the problem at different levels. Findings: 1) The research community focused on method level summarization. 2) Deep learning has dominated the last five years of this research field. 3) Researchers are regularly proposing bigger corpora for source code documentation. 4) Java and Python are the widely used programming languages as corpus. 5) BLEU is the most favored evaluation metric for the research persons.
Conference Paper
Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowledge types. We compared conventional machine learning (k-NN and SVM) with deep learning approaches trained on manually-annotated Java and .NET API documentation (n = 5,574). When classifying the knowledge types individually (i.e., multiple binary classifiers) the best AUPRC was up to 87
Article
In this article, we discuss methodological opportunities related to using a team-based approach for iterative-inductive analysis of qualitative data involving detailed open coding of semistructured interviews and focus groups. Iterative-inductive methods generate rich thematic analyses useful in sociology, anthropology, public health, and many other applied fields. A team-based approach to analyzing qualitative data increases confidence in dependability and trustworthiness, facilitates analysis of large data sets, and supports collaborative and participatory research by including diverse stakeholders in the analytic process. However, it can be difficult to reach consensus when coding with multiple coders. We report on one approach for creating consensus when open coding within an iterative-inductive analytical strategy. The strategy described may be used in a variety of settings to foster efficient and credible analysis of larger qualitative data sets, particularly useful in applied research settings where rapid results are often required.
Conference Paper
Code summarization provides a high level natural language description of the function performed by code, as it can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization; b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given. However, it is expected to generate the entire sequence from scratch at test time. This discrepancy can cause an exposure bias issue, making the learnt decoder suboptimal. In this paper, we incorporate an abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network). The actor network provides the confidence of predicting the next word according to current state. On the other hand, the critic network evaluates the reward value of all possible extensions of the current state and can provide global guidance for explorations. We employ an advantage reward composed of BLEU metric to train both networks. Comprehensive experiments on a real-world dataset show the effectiveness of our proposed model when compared with some state-of-the-art methods.
Conference Paper
This paper presents the results of a study that compared three think-aloud methods: concurrent think-aloud, retrospective think-aloud, and a hybrid method. The three methods were compared through an evaluation of a library website, which involved four points of comparison: task performance, participants' experiences, usability problems discovered, and the cost of employing the methods. The results revealed that the concurrent method outperformed both the retrospective and the hybrid methods in facilitating successful usability testing. It detected higher numbers of usability problems than the retrospective method, and produced output comparable to that of the hybrid method. The method received average to positive ratings from its users, and no reactivity was observed. Lastly, this method required much less time on the evaluator's part than did the other two methods, which involved double the testing and analysis time.
Chapter
The experiment data from the operation is input to the analysis and interpretation. After collecting experimental data in the operation phase, we want to be able to draw conclusions based on this data. To be able to draw valid conclusions, we must interpret the experiment data.
Article
Research in program comprehension has evolved considerably over the past decades. However, only little is known about how developers practice program comprehension in their daily work. This article reports on qualitative and quantitative research to comprehend the strategies, tools, and knowledge used for program comprehension. We observed 28 professional developers, focusing on their comprehension behavior, strategies followed, and tools used. In an online survey with 1,477 respondents, we analyzed the importance of certain types of knowledge for comprehension and where developers typically access and share this knowledge. We found that developers follow pragmatic comprehension strategies depending on context. They try to avoid comprehension whenever possible and often put themselves in the role of users by inspecting graphical interfaces. Participants confirmed that standards, experience, and personal communication facilitate comprehension. The team size, its distribution, and open-source experience influence their knowledge sharing and access behavior. While face-to-face communication is preferred for accessing knowledge, knowledge is frequently shared in informal comments. Our results reveal a gap between research and practice, as we did not observe any use of comprehension tools and developers seem to be unaware of them. Overall, our findings call for reconsidering the research agendas towards context-aware tool support.
Article
Reading reference documentation is an important part of programming with application programming interfaces (APIs). Reference documentation complements the API by providing information not obvious from the API syntax. To improve the quality of reference documentation and the efficiency with which the relevant information it contains can be accessed, we must first understand its content. We report on a study of the nature and organization of knowledge contained in the reference documentation of the hundreds of APIs provided as a part of two major technology platforms: Java SDK 6 and .NET 4.0. Our study involved the development of a taxonomy of knowledge types based on grounded methods and independent empirical validation. Seventeen trained coders used the taxonomy to rate a total of 5,574 randomly sampled documentation units to assess the knowledge they contain. Our results provide a comprehensive perspective on the patterns of knowledge in API documentation: observations about the types of knowledge it contains and how this knowledge is distributed throughout the documentation. The taxonomy and patterns of knowledge we present in this paper can be used to help practitioners evaluate the content of their API documentation, better organize their documentation, and limit the amount of low-value content. They also provide a vocabulary that can help structure and facilitate discussions about the content of APIs.
Conference Paper
Studies have shown that good comments can help programmers quickly understand what a method does, aiding program comprehension and software maintenance. Unfortunately, few software projects adequately comment the code. One way to overcome the lack of human-written summary comments, and guard against obsolete comments, is to automatically generate them. In this paper, we present a novel technique to automatically generate descriptive summary comments for Java methods. Given the signature and body of a method, our automatic comment generator identifies the content for the summary and generates natural language text that summarizes the method's overall actions. According to programmers who judged our generated comments, the summaries are accurate, do not miss important content, and are reasonably concise.
How do professional developers comprehend software
  • T Roehm
  • R Tiarks
  • R Koschke
  • W Maalej
T. Roehm, R. Tiarks, R. Koschke, and W. Maalej, "How do professional developers comprehend software?" in 2012 34th International Conference on Software Engineering. New York, NY, USA: IEEE, 2012, pp. 255-265.
Is ChatGPT the ultimate programming assistant -how far is it
  • H Tian
  • W Lu
  • T O Li
  • X Tang
  • S.-C Cheung
  • J Klein
  • T F Bissyandé
H. Tian, W. Lu, T. O. Li, X. Tang, S.-C. Cheung, J. Klein, and T. F. Bissyandé, "Is ChatGPT the ultimate programming assistant -how far is it?" 2023.
A prompt pattern catalog to enhance prompt engineering with ChatGPT
  • J White
  • Q Fu
  • S Hays
  • M Sandborn
  • C Olea
  • H Gilbert
  • A Elnashar
  • J Spencer-Smith
  • D C Schmidt
J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D. C. Schmidt, "A prompt pattern catalog to enhance prompt engineering with ChatGPT," 2023.