Conference Paper

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... There are few standard practices, leading to major differences in the reported results in different papers, as discussed in the previous section. For example, the works by LeClair et al. (LeClair and McMillan, 2019) and Hu et al. (Hu et al., 2018a) both modify the CODENN model from Iyer et al. (Iyer et al., 2016) to work on Java methods and comments. LeClair et al. and Hu et al. report very disparate results: A BLEU-4 score of 6.3 for CO-DENN on one dataset, and 25.3 on another, even though both datasets were generated from Java source code repositories. ...
... The dataset we use in this paper is based on the dataset provided by LeClair et al. (LeClair and McMillan, 2019) in a pre-release. We used this dataset because it is both the largest and most recent in source code summarization. ...
... Shimonaka et al. (Shimonaka et al., 2016) point out that the typical approach for identifying auto-generated code is a simple caseinsensitive text search for the phrase "generated by" in the comments of the Java files. LeClair et al. (LeClair and McMillan, 2019) report that this search turns out to be quite aggressive, catching nearly all auto-generated code in the repository. However, as with RQ 1 , the effect of this filter is theoretical and has not been measured in practice. ...
Preprint
Full-text available
Source Code Summarization is the task of writing short, natural language descriptions of source code. The main use for these descriptions is in software documentation e.g. the one-sentence Java method descriptions in JavaDocs. Code summarization is rapidly becoming a popular research problem, but progress is restrained due to a lack of suitable datasets. In addition, a lack of community standards for creating datasets leads to confusing and unreproducible research results -- we observe swings in performance of more than 33% due only to changes in dataset design. In this paper, we make recommendations for these standards from experimental results. We release a dataset based on prior work of over 2.1m pairs of Java methods and one sentence method descriptions from over 28k Java projects. We describe the dataset and point out key differences from natural language data, to guide and support future researchers.
... There are few standard practices, leading to major differences in the reported results in different papers, as discussed in the previous section. For example, the works by LeClair et al. (LeClair and McMillan, 2019) and Hu et al. (Hu et al., 2018a) both modify the CODENN model from Iyer et al. (Iyer et al., 2016) to work on Java methods and comments. LeClair et al. and Hu et al. report very disparate results: A BLEU-4 score of 6.3 for CO-DENN on one dataset, and 25.3 on another, even though both datasets were generated from Java source code repositories. ...
... The dataset we use in this paper is based on the dataset provided by LeClair et al. (LeClair and McMillan, 2019) in a pre-release. We used this dataset because it is both the largest and most recent in source code summarization. ...
... Shimonaka et al. (Shimonaka et al., 2016) point out that the typical approach for identifying auto-generated code is a simple caseinsensitive text search for the phrase "generated by" in the comments of the Java files. LeClair et al. (LeClair and McMillan, 2019) report that this search turns out to be quite aggressive, catching nearly all auto-generated code in the repository. However, as with RQ 1 , the effect of this filter is theoretical and has not been measured in practice. ...
Conference Paper
Full-text available
Source Code Summarization is the task of writing short, natural language descriptions of source code. The main use for these descriptions is in software documentation e.g. the one-sentence Java method descriptions in JavaDocs. Code summarization is rapidly becoming a popular research problem, but progress is restrained due to a lack of suitable datasets. In addition, a lack of community standards for creating datasets leads to confusing and unreproducible research results -- we observe swings in performance of more than 33% due only to changes in dataset design. In this paper, we make recommendations for these standards from experimental results. We release a dataset based on prior work of over 2.1m pairs of Java methods and one sentence method descriptions from over 28k Java projects. We describe the dataset and point out key differences from natural language data, to guide and support future researchers. Dataset Available at www.leclair.tech/data/funcom
Article
A smart contract is a software program executed on a blockchain, designed to facilitate functionalities such as contract execution, asset administration, and identity validation within a secure and decentralized ecosystem. Summarizing the code of Solidity smart contracts aids developers in promptly grasping essential functionalities, thereby enhancing the security posture of Ethereum-based projects. Existing smart contract code summarization works mainly use traditional information retrieval and single code features, resulting in suboptimal performance. In this study, we propose a fusing multiple code features (FMCF) approach based on Transformer for Solidity summarization. First, FMCF created contract integrity modeling and state immutability modeling in the data preprocessing stage to process and filter data that meets security conditions. At the same time, FMCF retains the self-attention mechanism to construct the Graph Attention Network (GAT) encoder and CodeBERT encoder, which respectively extract multiple feature vectors of the code to ensure the integrity of the source code information. Furthermore, the FMCF uses a weighted summation method to input these two types of feature vectors into the feature fusion module for fusion and inputs the fused feature vectors into the Transformer decoder to obtain the final smart contract code summarization. The experimental results show that FMCF outperforms the standard baseline methods by 12.45% in the BLEU score and maximally preserves the semantic information and syntax structures of the source code. The results demonstrate that the FMCF can provide a good direction for future research on smart contract code summarization, thereby helping developers enhance the security of development projects.
Article
To automatically generate the issue title, researchers formulated the problem into a one-sentence summarization problem and then proposed an effective method iTAPE. However, after analyzing the quality of the titles generated by the method iTAPE, which is measured by ROUGE-L, we find the ROUGE-L scores of only 42.7% of titles can exceed 0.3. This means the quality of the generated titles is not satisfactory and can limit the practicability of the method iTAPE. Therefore, we propose a quality prediction-based filter TitleGen-FL, If both of these two modules predict that the high-quality title cannot be generated by iTAPE, TitleGen-FL can automatically filter this issue and return a warning message. To evaluate the effectiveness of our proposed filter, we select the benchmark dataset gathered from real-world open-source projects as our experimental subject. Both automatic evaluation and human study show that our proposed filter TitleGen-FL can effectively filter the issues, which cannot generate high-quality titles by iTAPE.
Article
Code summarization aims to generate high-quality functional summaries of code snippets to improve the efficiency of program development and maintenance. It is a pressing challenge for code summarization models to capture more comprehensive code knowledge by integrating the feature correlations between the semantics and syntax of the code. In this paper, we propose a multi-modal similarity network based code summarization method: GT-SimNet. It proposes a novel code semantic modelling method based on a local application programming interface (API) dependency graph (Local-ADG), which exhibits an excellent ability to mask irrelevant semantics outside the current code snippet. For code feature fusion, GT-SimNet uses the SimNet network to calculate the correlation coefficients between Local-ADG and abstract syntax tree (AST) nodes and performs fusion under the influence of the correlation coefficients. Finally, it completes the prediction of the target summary by the generator. We conduct extensive experiments to evaluate the performance of GT-SimNet on two Java datasets. The results show that GT-SimNet achieved BLEU scores of 38.73% and 41.36% on the two datasets, 1.47%∼2.68% higher than the best existing baseline. Importantly, GT-SimNet reduces the BLEU scores by 7.28% after removing Local-ADG. This indicates that Local-ADG is effective for the semantic representation of the code.
Article
Background Code comment generation techniques aim to generate natural language descriptions for source code. There are two orthogonal approaches for this task, i.e., information retrieval (IR) based and neural-based methods. Recent studies have focused on combining their strengths by feeding the input code and its similar code snippets retrieved by the IR-based approach to the neural-based approach, which can enhance the neural-based approach’s ability to output low-frequency words and further improve the performance. Aim However, despite the tremendous progress, our pilot study reveals that the current combination is not generalizable and can lead to performance degradation. In this paper, we propose a straightforward but effective approach to tackle the issue of existing combinations of these two comment generation approaches. Method Instead of binding IR- and neural-based approaches statically, we combine them in a dynamic manner. Specifically, given an input code snippet, we first use an IR-based technique to retrieve a similar code snippet from the corpus. Then we use a Cross-Encoder based classifier to decide the comment generation method to be used dynamically, i.e., if the retrieved similar code snippet is a true positive (i.e., is semantically similar to the input), we directly use the IR-based technique. Otherwise, we pass the input to the neural-based model to generate the comment. Results We evaluate our approach on a large-scale dataset of Java projects. Experiment results show that our approach can achieve 25.45 BLEU score, which improves the state-of-the-art IR-based approach, neural-based approach, and their combination by 41%, 26%, and 7%, respectively. Conclusions We propose a straightforward but effective dynamic combination of IR-based and neural-based comment generation, which outperforms state-of-the-art approaches by a substantial margin.
Article
Context Stack Overflow is very helpful for software developers who are seeking answers to programming problems. Previous studies have shown that a growing number of questions are of low quality and thus obtain less attention from potential answerers. Gao et al. proposed an LSTM-based model (i.e., BiLSTM-CC) to automatically generate question titles from the code snippets to improve the question quality. However, only using the code snippets in the question body cannot provide sufficient information for title generation, and LSTMs cannot capture the long-range dependencies between tokens. Objective This paper proposes CCBERT, a deep learning based novel model to enhance the performance of question title generation by making full use of the bi-modal information of the entire question body. Method CCBERT follows the encoder–decoder paradigm and uses CodeBERT to encode the question body into hidden representations, a stacked Transformer decoder to generate predicted tokens, and an additional copy attention layer to refine the output distribution. Both the encoder and decoder perform the multi-head self-attention operation to better capture the long-range dependencies. This paper builds a dataset containing around 200,000 high-quality questions filtered from the data officially published by Stack Overflow to verify the effectiveness of the CCBERT model. Results CCBERT outperforms all the baseline models on the dataset. Experiments on both code-only and low-resource datasets show the superiority of CCBERT with less performance degradation. The human evaluation also shows the excellent performance of CCBERT concerning both readability and correlation criteria. Conclusion CCBERT is capable of automatically capturing the bi-modal semantic information from the entire question body and parsing the long-range dependencies to achieve better performance. Therefore, CCBERT is an effective approach for generating Stack Overflow question titles.
Article
Context Code summarization aims to automatically generate natural language descriptions for code, and has become a rapidly expanding research area. Data-driven code summarization models based on neural networks have proliferated in recent few years. Objective Almost all of existing neural models are built upon the granularity of token or AST node. This has several drawbacks: a) Code summarization requires high-level knowledge of code while token representations are limited to provide a global view; b) Such approaches can hardly model the hierarchy of code; c) Long input codes challenge such models to handle long-range dependencies due to the large number of tokens and AST nodes. Method To address these issues, we propose a novel framework to utilize hierarchical representation of code to generate better summaries. We consider two levels of code hierarchy: token-level and statement-level. Our framework contains a pair of customized encoder-decoder models for tokens and AST of code respectively. Each of them has a hierarchical encoder that aims to extract both token and statement-level code features, and an attentional decoder with the ability to attend to those different levels of representation during decoding. They are then combined to predict summaries via ensemble learning. Results We conduct extensive experiments to evaluate our models on a large Java corpus. The experimental results show that our approach outperforms several state-of-the-art baselines by a substantial margin. Conclusion In conclusion, our approach could better learn global information of code and shift attention between important statements during summary generation. With the help of hierarchical attention, the models are able to locate keywords more accurately in a top-down way. Ensemble learning is also proved to be an effective way to benefit from multiple input sources.
Article
Full-text available
Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of source code model significantly affects the performance on those tasks. Most recent works was mainly focused on complementing AST-based source code models by introducing contextual dependencies extracted from CFG. However, all of them pay little attention to the representation of basic blocks, which are the basis of contextual dependencies. In this paper, we integrated AST and CFG and proposed a novel source code model embedded with hierarchical dependencies. Based on that, we also designed a neural network that depends on the graph attention mechanism. Specifically, we introduced the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information and fill the gap. We have evaluated this model on three practical software engineering tasks and compared it with other state-of-the-art methods. The results show that our model can significantly improve the performance. For example, compared to the best performing baseline, our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
Article
A code comment generation system can summarize the semantic information of source code and generate a natural language description, which can help developers comprehend programs and reduce time cost spent during software maintenance. Most of state-of-the-art approaches use RNN (Recurrent Neural Network)-based encoder-decoder neural networks. However, this kind of method may not generate high-quality description when summarizing the information among several code blocks that are far from each other (i.e., the long-dependency problem). In this paper, we propose a novel Semantic CNN parser SeCNN for code comment generation. In particular, we use a CNN (Convolutional Neural Network) to alleviate the long-dependency problem and design several novel components, including source code-based CNN and AST-based CNN, to capture the semantic information of the source code. The evaluation is conducted on a widely-used large-scale dataset of 87,136 Java methods. Experimental results show that SeCNN achieves better performance (i.e., 44.69% in terms of BLEU and 26.88% in terms of METEOR) and has lower execution time cost when compared with five state-of-the-art baselines.
ResearchGate has not been able to resolve any references for this publication.