PreprintPDF Available

LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Code comment generation is the task of generating a high-level natural language description for a given code method or function. Although researchers have been studying multiple ways to generate code comments automatically, previous work mainly considers representing a code token in its entirety semantics form only (e.g., a language model is used to learn the semantics of a code token), and additional code properties such as the tree structure of a code are included as an auxiliary input to the model. There are two limitations: 1) Learning the code token in its entirety form may not be able to capture information succinctly in source code, and 2) The code token does not contain additional syntactic information, inherently important in programming languages. In this paper, we present LAnguage Model and Named Entity Recognition (LAMNER), a code comment generator capable of encoding code constructs effectively and capturing the structural property of a code token. A character-level language model is used to learn the semantic representation to encode a code token. For the structural property of a token, a Named Entity Recognition model is trained to learn the different types of code tokens. These representations are then fed into an encoder-decoder architecture to generate code comments. We evaluate the generated comments from LAMNER and other baselines on a popular Java dataset with four commonly used metrics. Our results show that LAMNER is effective and improves over the best baseline model in BLEU-1, BLEU-2, BLEU-3, BLEU-4, ROUGE-L, METEOR, and CIDEr by 14.34%, 18.98%, 21.55%, 23.00%, 10.52%, 1.44%, and 25.86%, respectively. Additionally, we fused LAMNER's code representation with the baseline models, and the fused models consistently showed improvement over the non-fused models. The human evaluation further shows that LAMNER produces high-quality code comments.
Content may be subject to copyright.
LAMNER: Code Comment Generation Using Character
Language Model and Named Entity Recognition
Rishab Sharma
University of British Columbia
Canada
rishab.sharma@alumni.ubc.ca
Fuxiang Chen
University of British Columbia
Canada
fuxiang.chen@ubc.ca
Fatemeh Fard
University of British Columbia
Canada
fatemeh.fard@ubc.ca
ABSTRACT
Code comment generation is the task of generating a high-level
natural language description for a given code method/function.
Although researchers have been studying multiple ways to generate
code comments automatically, previous work mainly considers
representing a code token in its entirety semantics form only (e.g.,
a language model is used to learn the semantics of a code token),
and additional code properties such as the tree structure of a code
are included as an auxiliary input to the model. There are two
limitations: 1) Learning the code token in its entirety form may
not be able to capture information succinctly in source code, and 2)
The code token does not contain additional syntactic information,
inherently important in programming languages.
In this paper, we present LAnguage Model and Named Entity
Recognition (LAMNER), a code comment generator capable of en-
coding code constructs eectively and capturing the structural
property of a code token. A character-level language model is used
to learn the semantic representation to encode a code token. For the
structural property of a token, a Named Entity Recognition model is
trained to learn the dierent types of code tokens. These representa-
tions are then fed into an encoder-decoder architecture to generate
code comments. We evaluate the generated comments from LAM-
NER and other baselines on a popular Java dataset with four com-
monly used metrics. Our results show that LAMNER is eective and
improves over the best baseline model in BLEU-1, BLEU-2, BLEU-3,
BLEU-4, ROUGE-L, METEOR, and CIDEr by 14.34%, 18.98%, 21.55%,
23.00%, 10.52%, 1.44%, and 25.86%, respectively. Additionally, we
fused LAMNER’s code representation with the baseline models, and
the fused models consistently showed improvement over the non-
fused models. The human evaluation further shows that LAMNER
produces high-quality code comments.
CCS CONCEPTS
Computing methodologies Articial intelligence.
KEYWORDS
code comment generation, code summarization, character language
model, named entity recognition
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICPC ’22, May 16–17, 2022, Virtual Event, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
ACM Reference Format:
Rishab Sharma, Fuxiang Chen, and Fatemeh Fard. 2022. LAMNER: Code
Comment Generation Using Character Language Model and Named Entity
Recognition. In 30th International Conference on Program Comprehension
(ICPC ’22), May 16–17, 2022, Virtual Event, USA. ACM, New York, NY, USA,
12 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
Maintaining the source code documentation is an important soft-
ware engineering activity. It has been reported that software devel-
opers are spending more than half of their time trying to understand
code during the software maintenance cycle [
79
]. Although pro-
gram comprehension is the main activity for software developers
[
68
], reading code requires additional mental eort as it is not a nat-
ural practice for humans [
18
]. A well-commented code aids in easier
code comprehension and assists in the ecient maintenance of soft-
ware projects [
69
]. Despite the importance of well-documented
code, the previous study has reported that only a small percentage
of the methods in software projects are commented [
49
]. Moreover,
comments are mostly absent or outdated as the software evolves
[
28
]. As a result, researchers have studied on how to generate natu-
ral language code comments from a given code method/function
automatically [
1
,
36
,
38
,
40
]. The code comments aim to explain
what the code is doing so that developers do not have to inspect
every line of code to infer how it works. This often happens when
new developers join a new code repository, or when developers
revisit a code repository that has been inactive for some time.
Since 2016, multiple studies have leveraged the deep learning-
based encoder-decoder Neural Machine Translation models (NMT)
for comment generation [
36
38
,
40
,
46
,
76
,
79
,
83
]. The encoder in
a general NMT model takes a sequence of tokens from a language
as input (e.g., English), and the decoder generates the translation
of the input sequence into another language (e.g., German) [
46
].
The code comment generation problem can be seen as a translation
task between the code (programming language) and the natural
language text, which maps an input code snippet to comment in
English as output [
46
]. However, programming languages and natu-
ral languages have several dissimilar features [
18
,
29
]. Programming
languages are repetitive, mainly due to their syntax, and they can
have an innite number of vocabulary based on the developers’
naming choices for identiers [
42
,
45
]. Therefore, the NMT tech-
niques used for the translation of natural languages require specic
techniques to handle the dierences in programming languages
[45].
Existing work studied multiple ways to generate code comments
automatically. Early neural models for code comment generation
arXiv:2204.09654v1 [cs.CL] 5 Apr 2022
ICPC ’22, May 16–17, 2022, Virtual Event, USA Rishab Sharma, Fuxiang Chen, and Fatemeh Fard
[
36
,
40
] used the Long Short Term Memory (LSTM) based encoder-
decoder architecture. It mainly considers representing a code token
in its entirety semantics form (e.g., a language model is used to learn
the semantics of a code token). Subsequent works [
8
,
36
,
47
,
72
]
further improved the performance by incorporating additional code
properties such as Abstract Syntax Trees (AST) to represent the
syntactical structure of code. There are two limitations: 1) Learning
the code token in its entirety form may not be able to capture infor-
mation succinctly in source code, e.g., when developers are using
camel case or snake case identiers. Writing camel case or snake
case identiers is a common practice for developers to combine
multiple elements together because the identier names cannot
contain spaces [
14
]. For example, writing camel case identiers is
the convention used in Java code. Many existing code comment
generation models do not distinguish these identiers properly.
2) The embedding of the code token does not contain additional
structural information, which is inherently important in program-
ming languages. For example, developers need to write the code
that conforms to a certain structure for a code to be compiled and
executed successfully. A code token may be an access modier,
an identier name, or other code constructs. Code constructs rep-
resent the syntactic/structural meaning of each token. The code
constructs are reported to be useful in code comprehension and
are used in Software Engineering tasks such as bug detection and
program repair [
7
,
19
,
59
,
64
]. We note that AST, which provides
a tree structure for code, is another way to represent structural
information for code. Unlike code constructs, AST cannot encode
token-level information.
In this paper, we present LAnguage Model and Named Entity
Recognition (LAMNER), a code comment generator capable of eec-
tively encoding identiers and capturing a code token’s structural
property. To encode a code token that includes a type of identier,
e.g., camel case identier, snake case identier, etc., a character-
level language model is used to learn the semantic representation.
For the structural property of a token, a Named Entity Recognition
(NER) model is trained to understand the code constructs. These
representations are then fed into an encoder-decoder architecture
to generate code comments. We note here that a code token encoded
by our encoder will contain semantic and syntactic information. We
evaluate the generated comments from LAMNER and other base-
lines against the ground truth on a popular Java dataset with four
commonly used metrics: BLEU, ROGUE-L, METEOR, and CIDEr.
Our results on code comment generation showed that LAMNER
is eective and improves over the best baseline model in BLEU-
1, BLEU-2, BLEU-3, BLEU-4, ROUGE-L, METEOR, and CIDEr by
14.34%, 18.98%, 21.55%, 23.00%, 10.52%, 1.44%, and 25.86%, respec-
tively. Additionally, we fused LAMNER’s code representation with
baseline models, and the fused models consistently showed improve-
ment over the non-fused models. We also conducted a qualitative
study to evaluate the generated comments from LAMNER. The
qualitative results show that the comments generated by LAM-
NER describe the functionality of the given method correctly, are
grammatically correct, and are highly readable by humans.
The primary signicance of this work over existing work in-
cludes the following:
(1) The learning of the semantics and syntax of a code is eec-
tive. Our proposed
character-based (semantics) embeddings
suggested that it can understand Java code better
. Our abla-
tion study on semantics embeddings shows that it does not per-
form better when our proposed character-based embeddings are
replaced and trained with state-of-the-art code embeddings (Code-
BERT). We observed degradation in performance. When our pro-
posed character-based embeddings are used on other models (e.g.,
RENCOSLAMNER-Embeds), it improves the performance.
Separately, the encoding of the syntax within the embeddings is
rarely studied.
Using the syntax embeddings for code summa-
rization is eective, and our experiments showed that the
learned syntax embedding helps in generating better sum-
maries.
(2) LAMNER has high adaptability. Our experiments show that
the pre-trained embeddings of LAMNER can be combined
with the pre-trained embeddings of other approaches to im-
prove their performance, e.g., RENCOS combined with LAM-
NER forms RECOSLAMNER-Embeds.
Although exploring other
tasks is not the primary focus in this study, the embeddings from
LAMNER can be deployed into dierent tasks, such as bug detec-
tion and code clone detection. These tasks may benet from more
information on a programming language’s semantics and syntax.
We note here that further study is still required for a more compre-
hensive evaluation.
The contributions of our work are as follows:
A novel code comment generation model, LAMNER,
that encodes both the semantic and syntactical infor-
mation of a code token.
We propose LAMNER that lever-
ages a bidirectional character-level language model and a
NER model for learning semantics and the syntactical knowl-
edge of code tokens, respectively.
Empirical evaluation of the dierent component con-
tribution in LAMNER.
We perform an ablation study to
evaluate dierent variations of LAMNER, such as generating
code comments using the proposed semantic embeddings or
the proposed syntactic embeddings.
Fusing and evaluation of the embeddings learned from
LAMNER with baseline models.
We show the adaptabil-
ity of LAMNER by combining the embeddings learned from
LAMNER with baseline models.
The trained models are open sourced1
for replication of
the results and usage of the pre-trained embeddings and
LAM and NER models in the community.
We stress that the main novelty of our work is not to develop a
new deep learning architecture but to propose a novel pre-trained
embedding that captures both the semantic and syntactic knowl-
edge of the code. Although the role of identiers and their impor-
tance for source code analysis and comprehension is well known,
there has been no known technique to represent this combined
knowledge as embeddings for code summarization. We show that
our proposed pre-trained embedding can enhance the results of
other approaches.
The rest of this paper is structured as follows. Our approach
is described in Section 2 and we explain the experimental setup
in Section 3. Quantitative and qualitative results are presented in
Section 4 and 6, respectively. We point out the threats to validity
1https://github.com/fardfh-lab/LAMNER
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition ICPC ’22, May 16–17, 2022, Virtual Event, USA
in Section 7 and review the related works in Section 8. Finally, we
conclude the paper in Section 9.
2 PROPOSED APPROACH
Figure 1 illustrates the overview of our proposed approach, LAM-
NER. First, a bidirectional character-based language model and a
NER model (left box in gure 1) are trained separately on a code
corpus to generate the input code embeddings for our Semantic-
Syntax encoder-decoder architecture (right box in gure 1). Specif-
ically, the embeddings learned from the language, and the NER
models are concatenated for each token. These are the extracted
embeddings from both models that are concatenated without any
pre-processing. For example, if each of the embedding is a vector
of size 256, the concatenated embedding will be a vector of size 512.
Thus, the concatenated embedding incorporates both the semantic
(from the language model) and the syntactic knowledge (from the
NER). We call this the Semantic-Syntax embedding. The input to
the Semantic-Syntax encoder is the Semantic-Syntax embedding.
The decoder then uses the attention mechanism to decode the input
code snippet into code comments.
The details for the character-based language model, the NER
model, and the Semantic-Syntax encoder-decoder are described in
Section 2.1, 2.2 and 2.3, respectively.
2.1 Learning Semantics Representation of
Code Sequences
We utilize the approach of Akbik et al. [
3
] to train and adapt a
character-level language model to learn the semantic representation
of a code token. It is reported that this language model can capture
multiple words within a token better. We note here that our aim is
not to generate character embeddings but to use a character-level
language model to generate embeddings for a code token. We rst
describe the character-level language model (Section 2.1.1) before
explaining how it is used to generate embeddings for a code token
(Section 2.1.2).
2.1.1 Character-Level Language Model Architecture. Figure 2 illus-
trates the architecture of the language model, which employs a
single layer bidirectional LSTM (Bi-LSTM) [
35
]. Bi-LSTM captures
the left, and right context of a character in the sequence [
39
] using a
forward and a backward language model as shown in Figure 2. The
input to each LSTM unit is an embedding of a character initialized
randomly. Each LSTM unit processes the embedding to generate
the output and the next hidden state for the character. The output of
the last LSTM unit is used to select the next output character with
maximum likelihood. In Figure 2, the model predicts the character
lfor the given sequence of public Boo, which makes last letter of
public Bool sequence.
2.1.2 Extracting Semantic Embedding of Tokens from Language
Model. As shown in Figure 2, the hidden state of each character
maintains the contextual information of its previous characters. In
the left to right model (Forward LSTM Layer), the last character
of a token encapsulates the information of all characters of the
token based on its left context. Similarly, in the right to left model
(Backward LSTM Layer), the rst character of the token contains
the information of the previous token from its right context. To get
the embeddings of a token, we concatenate these two embeddings.
2.2 Learning Syntactical Structure of Code
Sequences
Encoding the syntactical structure of code in models has been
shown to improve performance in previous work [
8
,
9
,
26
,
36
]. This
section presents our approach to generate the syntactical context
of a code token. The dierence between the syntactical code struc-
ture in the previous work and ours is that the previous work does
not consider the syntactical properties within a code token itself.
For example, a syntactic code structure – AST, does not have a
direct one-to-one mapping with the code tokens. A code token’s
syntactical structure is inherently important in programming lan-
guages. For example, developers need to write the code to conform
to a certain syntactical structure to compile and execute a code
successfully.
We employ a NER model, shown in Figure 3, to generate a code
token’s syntactical embeddings. It takes the contextual embedding
of the input code token from the character language model and
uses a bidirectional LSTM sequence tagging with a conditional
random eld decoding layer [
39
] to map each input code token to
its syntactic entity type. For example, using the NER model for the
given code sequence public Boolean getBoolean2 ..., the token public
is labeled as modier, the token Boolean is labeled as return type,
and the token getBoolean2 is labeled as function.
It has been reported that the embeddings learned from the character-
based language model have better performance [
3
] than the classical
word embeddings such as FastText [
15
], and Glove [
62
] when used
as the input to a NER model. Thus, the information to our NER
model is the semantic representation learned from the trained char-
acter language model described previously in Section 2.1. Within
the NER model, the input embeddings are ne-tuned using a bidi-
rectional LSTM before the syntactic types are predicted in the
Decoding Layer as shown in Figure 3.
In the decoding phase, the CRF cell uses the knowledge from the
LSTM output states to predict the correct entity type. It is important
to note that we do not use the predicted entity types of the NER in
LAMNER. Rather, the extracted syntactical representation of the code
tokens learned from the NER model is used. After the NER model is
trained, it extracts the syntactic embedding for each code token.
Note that the scope of this work is not to detect these code enti-
ties. We believe that parsers can detect the syntactic entities better.
In comparison, our work presents a new technique to generate
context-sensitive syntactic embeddings of code tokens, which is
not available through parsers.
2.3 Semantic-Syntax Model
The pre-trained embeddings from the language and NER models
are concatenated to represent the semantic-syntax embedding for
code tokens in a given code snippet. The Semantic-Syntax encoder
takes this semantic-syntactic embedding of the
code tokens
as
input and models them together to output the semantic-syntax
embeddings of the
code sequence
. The attentive decoder, which
uses this code sequence embeddings, is trained to generate the code
comments in natural language. The details of the Semantic-Syntax
ICPC ’22, May 16–17, 2022, Virtual Event, USA Rishab Sharma, Fuxiang Chen, and Fatemeh Fard
Figure 1: Overview of LAMNER Framework. The Language and NER models on the left box are pre-trained to provide an initial
embeddings for the code tokens. The embeddings are then concatenated to serve as the input to the Semantic-Syntax encoder
on the right box to generate comments.
Encoder and the Semantic-Syntax Decoder are in Section 2.3.1 and
2.3.2, respectively.
2.3.1 Semantic-Syntax Encoder. The Semantic-Syntax encoder pro-
cesses the input using a single layer bidirectional GRU. We use GRU
as it is reported to have faster training time [
67
], and yet preserve
the information for long sequences [21].
We denote
𝐸𝑡
as the semantic-syntax embedding of token
𝑡
,
which is the concatenated pre-trained semantics and syntax embed-
dings from the language model and the NER model.
𝐸𝑡
serves as the
input to the GRU. Each GRU state processes the semantic-syntax
embedding for the current token,
𝐸𝑡
, in the code sequence and
generates a hidden state 𝑡.
The hidden state of the last token in the sequence,
𝑙𝑎𝑠𝑡
, contains
the sequential information of the complete sequence and is formed
by concatenating the internal hidden states,
𝑡𝑙𝑒 𝑓 𝑡
and
𝑡𝑟𝑖𝑔ℎ𝑡
, from
the left and the right layer. The hidden state of the last token,
𝑙𝑎𝑠𝑡
,
is then fed into a fully connected linear layer. The formal equation
for the fully connected layer, 𝑦𝑓 𝑐 , is shown below:
𝑦𝑓 𝑐 =𝑙𝑎𝑠𝑡 𝑊𝑡+𝑏(1)
Figure 2: Overview of the bidirectional character-level lan-
guage model architecture. The input to the model is every
character within each line of code, and the model is trained
to predict the next character.
Figure 3: Overview of the NER model to generate the syn-
tactic code embedding for LAMNER. The input to the model
is the semantic embeddings of every code token, and they
are further ne-tuned to produce the syntactic code embed-
dings.
where
𝑊𝑡
and
𝑏
are the weight matrix and bias values, respec-
tively. The output of the fully connected layer,
𝑦𝑓 𝑐
, is then passed
into a
𝑡𝑎𝑛ℎ
layer to form the nal output
𝑓 𝑖𝑛𝑎𝑙
of the Semantic-
Syntax encoder. The formal equation is shown below:
𝑓 𝑖𝑛𝑎𝑙 =𝑡 𝑎𝑛ℎ (𝑦𝑓 𝑐 )(2)
2.3.2 Semantic-Syntax Decoder. The Semantic-Syntax decoder im-
plements the popular Bahdanau’s attention mechanism on a uni-
directional GRU, which was reported to have good performance
[
11
]. It uses the Semantic-Syntax encoder’s nal hidden state to
pay attention to the input sequence’s important tokens. The atten-
tion mechanism prevents the decoder from generating the natural
language comments based on a single nal context vector. Instead,
it calculates the attention weights for each input token and pays at-
tention to the tokens with larger attention values during decoding.
To predict a token, the decoder will look up the semantic embed-
dings of the comments learned during the training of the NMT
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition ICPC ’22, May 16–17, 2022, Virtual Event, USA
model. The decoder will predict the next token until it reaches the
maximum sequence length or the end of sentence token eos.
3 EXPERIMENTAL SETUP
In this section, we describe how we evaluate our approach for its
eectiveness, including the research questions, the dataset used,
the model training details, as well as the baselines models and the
variations of our proposed model for comparison, in Section 3.1,
3.2, 3.3, and 3.4, respectively.
3.1 Research Questions
To evaluate our approach, we seek to address the following research
questions in our experiment:
RQ1: (Performance of LAMNER) How does our proposed
model perform compared to other baselines?
In this research
question, we evaluate the performance of LAMNER with the base-
line models using common and popular metrics for comment gen-
eration: BLEU-n (𝑛∈ [1,4]), ROUGE-L, METEOR, and CIDEr.
RQ2: (Contribution of the components in LAMNER, e.g.,
the semantic component, the syntactic component, etc.) Which
parts of our model contribute more to the performance?
Rather
than using our proposed semantics-syntax embeddings, here, we
perform an ablation study to evaluate our model using its dierent
variants.
RQ3: (Eect of Fusing LAMNER with other models) What
is the eect of combining LAMNER with other models for
comment generation?
We investigate the adaptability of LAM-
NER by integrating its embedding with existing embedding and
other models.
3.2 Dataset
We use the widely used Java dataset for code comment generation
collected from popular GitHub repositories by Hu et al. [
38
]. The
dataset consists of two parts: the method code and its comment, the
rst sentence extracted from the Javadoc. In total, there are 69,708
code and comment pairs. The training, validation, and testing set
are split distinctly into 8: 1: 1. Following previous work [
38
], we set
the maximum size for code tokens and comment tokens to 300 and
30, respectively. Lengthy code and comments will be truncated up
to the maximum size.
The statistics for the dataset is provided in Table 1, under the ‘#
Records in Dataset’ column.
Table 1: Dataset details for code comment generation
Split # Records in Dataset
Train 69,708
Validation 8,714
Test 8,714
3.3 Model Training
3.3.1 Language Models. In our approach, we train character-based
language models. The underlying architecture for the model is the
same as described in Section 2.1.
The language model learns the semantic representation of code
tokens, and it is trained on the code corpus of the training dataset.
Over here, we are not interested in the code comments as we only
want to learn the code representation. Thus, we exclude all com-
ments (e.g., inline, block, and JavaDoc comments). We perform
comments removal using the Javalang library as it is reported to
have good performance in a previous study [
83
]. The language
model has a dropout probability of 0.1 applied for regularization
purposes.
3.3.2 Named Entity Recognition Model. We require a labeled dataset
to train the NER model where each code token is linked to its cor-
responding syntactic type. For example, a code token may be as-
sociated with an access modier, operator, or other types. We use
the Javalang parser to obtain a labeled dataset from the training
dataset. The Javalang parser is used in previous studies [
16
,
51
] and
is reported to have good accuracy in labeling the code tokens into
their associated types. The Javalang parser labels some code tokens
in a granular fashion. For example, it groups all types of identiers
(e.g., class name, function name, etc.) into a common label identier,
and all types of separators (e.g., end of a method, end of the line,
etc.) into another common label separator.
In order to learn the ner nuances of the token type, we mod-
ify some of the labels that are generated from the Javalang parser.
Specically, we breakdown the identier and the separators types
where the former is divided into ve subtypes: class,function,ob-
ject,modier, and return-type, and the latter is divided into three
subtypes: body-start-delimiter,body-end-delimiter, and end-of-line
(eol). The breakdown of the identiers are performed as follow: To
identify a class, it must conform to the JAVA coding convention
i.e., the identier starts with an upper-case character and it must
not contain any braces as sux. For function and object identiers,
they must be suxed by round braces. To distinguish an object
from an identier, it must have a corresponding class identier with
exact naming (case insensitive). If there is such a naming match, we
categorise it as an object, else, we categorise it as a function. The
return-types are dened at the start of the function denition, and
we identify them directly. Java has a specic set of access modiers,
i.e., static, public, and private, and identiers that contain them will
be labeled as modier. For the separators, the token “{" is used to in-
dicate the start of a new body – we labeled it as body-start-delimiter.
Similarly, we labeled the token “}" as body-end-delimiter to present
the end of a body section. The “;" token is used to indicate the end
of a code line, and we labeled it as end-of-line (eol).
We train the NER model on the training dataset and evaluate its
performance on the test dataset. The code entity types inferred by
the Javalang parser are considered the ground truth labels, whereas
the code entities generated by the NER model are the predicted la-
bels. We applied dropout with a probability of 0.1 for regularization
purposes. For evaluating the NER model, we use Precision, Recall,
and F1 scores, which are the commonly used metrics to assess NER
models in previous studies [
2
]. On average, the NER model can
achieve 99.41%, 93.66%, and 93.89% for Precision, Recall, and F1
scores, respectively.
As mentioned previously, the goal of this work is not on im-
proving the existing language parser but to generate the syntacti-
cal information of each code token. Specically, the embeddings
ICPC ’22, May 16–17, 2022, Virtual Event, USA Rishab Sharma, Fuxiang Chen, and Fatemeh Fard
learned in the NER model will be used to represent the syntactical
information of each code token. We note here that the Precision,
Recall, and F1 scores are used only to evaluate the NER model, the
intermediate model used in LAMNER. Other metrics that are used
for code comment generation evaluation are described in Section
3.5.
Note the dierence between Javalang parser and the NER model.
The Javalang tool tags each token as a modier, data type, etc.,
whereas the NER model is trained to predict the type of the code
tokens (i.e. the code constructs, e.g. identier). If we use the Javalang
tool, it can only provide us with the syntactic types of the tokens.
These syntactic types are discrete values and we will not be able
to incorporate them into our proposed deep learning networks.
Therefore, we are unable to use the Javalang parser. By using the
NER model, we extract meaningful vector, representations about
the type of code tokens which is not possible using the Javalang
parser. These embeddings are used in LAMNER.
3.3.3 Semantic-Syntax Model. To train the NMT encoder-decoder
model, we use the dataset described in section 3.2. Similar to pre-
vious work [
49
,
83
], the numerical and string values are replaced
with the
NUM
and
STR
tokens, respectively. The hidden size of the
Semantic-Syntax model is set to 512, and it is trained for 100 epochs
or until the learning rate decays to 1
𝑒
7. The initial learning rate
is set to 0
.
1, and the batch size is 16. Both encoder and decoder
applied a dropout probability of 0
.
1. If the validation loss does not
improve after 7 consecutive epochs, the learning rate is decayed by
a factor of 0
.
1. All experiments are conducted on an NVIDIA Tesla
V100 GPU with 32 GB memory.
3.4 Baselines and Model Variations
3.4.1 Other Approaches. We compare our approach with the fol-
lowing baseline models, which are commonly used in many com-
ment generation studies [
32
,
52
,
79
,
83
]. The availability of the
models is also another important factor in choosing the baselines.
CODE-NN
initializes the code input with one hot vector encod-
ing and uses an LSTM-based encoder-decoder model to generate
code comments [40].
Hybrid-DRL
employs an actor-critic reinforcement learning ap-
proach to generate natural language summaries [
72
]. They generate
a hybrid code representation using LSTM and AST, and perform
hybrid attention that follows an actor-critic reinforcement learning
architecture.
AST-Attend-GRU
uses the code and the Structure-Based Tra-
versal (SBT) representation of AST as input to two separate en-
coders. The input inside the two encoders is processed and com-
bined to generate the nal output [47].
TL-CodeSum
uses a double encoder architecture to generate
code comments [
38
]. The API knowledge is encoded into embed-
dings, which are transferred to the model to produce comments.
Re2-Com
combines both information retrieval and deep learning-
based techniques. A combination of input code, similar code snip-
pets, AST, and exemplar comments generates the code comments.
RENCOS
combines the information retrieval techniques with
NMT-based models [
83
] . It uses two syntactically and semantically
similar code snippets as input. The conditional probability from the
two inputs is then fused to generate the nal output.
CodeBERT
trains a code-level embeddings model (pre-trained
language model) and uses it to perform code summarization through
a ne-tuning process [27].
LAMNERCodeBERT-Embeds
leverages the same architecture used
in LAMNER. However, instead of the LAM and NER embeddings,
it employs the embedding extracted from CodeBERT to initialize
its vocabulary. [27].
We use the ocial implementation available on the authors’
GitHub repositories for all of these works. For the Seq2Seq model,
its ocial implementation is from OpenNMT 2[44].
3.4.2 Variations of Our Model. We consider four variations of our
model as described below.
LAMNERLAM
This model uses only the semantic code embed-
dings learned from the character-level language model.
LAMNERNER
This model uses only the syntactic code embed-
ding learned from the NER model.
LAMNERStatic
In this model, the Semantic-Syntax code embeddings are kept
static and are not further ne-tuned during training. This model
shows the performance of the pre-trained embeddings.
LAMNER
This model uses the concatenated semantic and syn-
tactic code embeddings as the input to the Semantic-Syntax encoder,
as shown in Figure 1. The purpose of this model is to evaluate the
eectiveness of combining the two embeddings for code comment
generation. Note that in LAMNER, the model is initialized with
the Semantic-Syntax code embeddings, and the code embeddings
are further ne-tuned during training. The ne-tuning of code
embeddings means that once the embeddings are extracted and
concatenated from the language model and the NER model, they are
used to initialize the vocabulary. The vocabulary matrix parameters
that are initialized with the LAMNER embeddings are then further
learnt with the other model parameters.
3.5 Evaluation Metrics
Similar to previous works [
72
,
83
], we evaluate the performance
of our model and the baseline models using the following metrics:
BLEU [61], ROUGE-L [50], METEOR [12], and CIDER [71].
BLEU
measures the n-gram (
𝑛∈ [
1
,
4
]
) geometric precision
𝑝𝑛
between the generated comment (C) and the ground truth (G) [
61
].
ROUGE
is a recall-based metric that computes the number of
tokens from (G) that appears in (C) [
50
]. ROUGE-L nds the F-
score of the longest common subsequence (LCS) between the two
sentences X and Y with length
𝑚
and
𝑛
, respectively [
50
].
METEOR
calculates the semantic score using an alignment approach, in which
a unigram in a sentence is mapped to zero or one unigram in another
sentence in a phased manner [
12
].
CIDEr
rates the correctness
of the comments [
76
,
83
]. It performs Term Frequency-Inverse
Document Frequency (TF-IDF) weighting for each token and uses
cosine similarity between the Reference and candidate sentences
[71].
For all the metrics, a higher value is considered a better score.
2https://opennmt.net/
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition ICPC ’22, May 16–17, 2022, Virtual Event, USA
4 RESULTS
In this section, we present the results for our research questions
(Section 3.1).
4.1 RQ1: (Performance of LAMNER)
Table 2 shows the results of the baseline models and all the varia-
tions of our proposed approach. The rst column shows the dierent
models used in the evaluation, and columns two to ve show the
BLEU scores. ROUGE-L, METEOR, and CIDEr scores are shown
in columns six, seven, and eight, respectively. For all the baselines,
we used the best hyperparameters and settings mentioned by the
authors of the models to have a fair comparison.
Figure 4: Attention behavior of LAMNERCodeBERT-Embeds.
Only a few tokens have focused attention.
Among the baselines (rows one to eight), LAMNER
CodeBERT-Embeds
has the best scores for BLEU{1-4}, ROUGE-L and CIDEr, while REN-
COS has the best score for METEOR. Our proposed model, LAMNER
(last row), has achieve the highest scores in BLEU-1, BLEU-2, BLEU-
3, BLEU-4, ROUGE-L, METEOR and CIDEr. It improves over the
best baseline model in BLEU-1, BLEU-2, BLEU-3, BLEU-4, ROUGE-
L, METEOR and CIDEr by 14.34%, 18.98%, 21.55%, 23.00%, 10.52%,
1.44%, 25.86%, respectively. Interestingly, the results of CodeBERT
with the recommended setting of training for 3 epochs are very
low.
Summary of RQ1 results: Our proposed approach, LAMNER,
is eective and achieve the best score among all the baseline
models.
4.2 RQ2: (Contribution of the components in
LAMNER)
The results for the variations of our proposed model, LAMNER,
are shown in rows nine to eleven in Table 2. LAMNER
NER
has
higher scores than LAMNER
LAM
for all the evaluation metrics, and
both LAMNER
NER
and LAMNER
LAM
have better scores than the
baseline models in all the metrics. When compared with LAMNER,
which is our proposed model learned from the combination of both
the embeddings of NER and LAM, LAMNER improves in all the
metrics. These results show that our proposed approach is eective,
and the learned syntactic and semantic information can improve
the results.
Even when we use the embeddings as static vectors without
further training, i.e., the LAMNER
Static
, the results showed im-
provement in all the baseline models, except for RENCOS – there
is a slight drop in the METEOR score (1.84%). Even though there is
a small decrease in the METEOR score, we note that all the other
metrics have improvements over the baseline models. When com-
paring LAMNER
Static
with both NER and LAM, the latter has better
performance in the majority of the metrics, except for BLEU-3 and
BLEU-4 in LAM.
Summary of RQ2 results: The syntactic embeddings (i.e.,
NER) has more contribution to the model’s performance. Both
LAM embeddings and NER embeddings learn meaningful
information about the code, and their combination improves
the results further.
Figure 5: Attention behavior of LAMNER on code samples.
The attention is distributed among more tokens.
4.3 RQ3: (Eect of Fusing LAMNER with other
models)
To analyze the adaptability of LAMNER, we combine it with other
models. We integrate LAMNER with existing works in two dierent
ways i) Combining with existing embeddings and ii) Using the pre-
trained embeddings of LAMNER with existing models. For the rst
approach, we combine the embedding generated from CodeBERT,
concatenate it with the LAMNER embeddings (i.e., LAM and NER
embeddings), and then use it in the NMT model to generate com-
ments. The model is shown as LAMNER
CodeBERT-Embeds+LAMNER-Embeds
in Table 3. Combining LAMNER embedding with CodeBERT em-
bedding improves the LAMNER
CodeBert-Embeds
results by 7.26% -
10.02% on BLEU scores, and 3.71% and 7.5% for ROUGE-L and ME-
TEOR, respectively (CIDEr remains the same).
Additionally, we choose RENCOS, which is the best model among
the baselines, and incorporate the LAM and NER embeddings (i.e.,
LAMNER embedding) into the architecture of RENCOS. In these
settings, we initialize each token in the vocabulary of RENCOS with
LAMNER embedding. This is denoted as RENCOS
LAMNER-Embeds
in Table 3. Initializing the pre-trained embeddings of LAMNER in
RENCOS improves its scores by 6% to 11% for all the evaluation
metrics. Here, LAMNER embeddings provide an initial warm-up
state for RENCOS, which is then further ne-tuned during the
training.
ICPC ’22, May 16–17, 2022, Virtual Event, USA Rishab Sharma, Fuxiang Chen, and Fatemeh Fard
Table 2: Evaluation of various baseline models and our proposed model, LAMNER. LAMNER has the best performance in
BLEU{1-4}, ROUGE-L, METEOR and CIDEr. Our model variants have also consistently achieve better performance than the
baseline models.
Model BLEU-1(%) BLEU-2(%) BLEU-3(%) BLEU-4(%) ROUGE-L(%) METEOR(%) CIDEr
CODE-NN 23.90 12.80 8.60 6.30 28.90 9.10 0.98
AST-Attend-GRU 22.00 10.05 5.06 2.79 24.92 8.82 0.30
CodeBERT 24.73 18.35 15.06 13.16 34.46 17.82 2.01
TL-CodeSum 29.90 21.30 18.10 16.10 33.20 13.70 1.66
Hybrid-DRL 32.40 22.60 16.30 13.30 36.50 13.50 1.66
Re2-Com 33.19 24.17 19.63 17.07 41.03 15.70 1.56
RENCOS 38.32 33.33 30.23 27.91 41.07 24.98 2.50
LAMNERCodeBERT-Embeds 44.35 35.19 30.91 28.48 47.64 21.59 2.63
LAMNERStatic 49.57 40.86 36.68 34.21 51.42 24.52 3.20
LAMNERLAM 50.11 41.09 36.63 33.98 52.41 25.00 3.23
LAMNERNER 50.54 41.68 37.36 34.83 52.64 25.30 3.27
LAMNER 50.71 41.87 37.57 35.03 52.65 25.34 3.31
These results conrm that the embedding learned by the LAM-
NER model using the character-based language model and NER
model can be adapted with other models or combined with dierent
embeddings and are not specic to LAMNER only.
Summary of RQ3 results: LAMNER embeddings can be com-
bined with existing works for code comment generation. Our
proposed Character-level Language models and NER models
are eective in creating comments. They further improve the
results of other models when used as initialization embed-
dings or combined with other embeddings.
5 EXAMPLE ATTENTION MAPS
Attention plays an important role in the quality of the generated
comments [
46
]. The higher the value of attention, the more impor-
tant a particular token is during the predictions. Previous works
[
38
,
47
] used attention behavior to understand the prediction be-
havior of their models. Figures 4 and 5 show the attention behavior
of the LAMNER
CodeBERT-Embeds
and LAMNER models on two code
samples.
For Sample 1: public long max() { return deltaMax.get();} “gets the
maximum of the elements ." and “return the maximums value ." are
the comments generated by LAMNER and LAMNER
CodeBERT-Embeds
respectively. In a similar fashion, for Sample 2: public BootPanel() {
initComponents();} are “creates new instance of customizerui ." and
“creates new form of .", are the comments generated by LAMNER
and LAMNERCodeBERT-Embeds respectively.
We further compare the generated comments with the attention
mechanism in LAMNER and LAMNER
CodeBERT-Embeds
in gure 4
and 5. We observed that LAMNER has attention distributed to more
tokens as compared to LAMNER
CodeBERT-Embeds
, which focused its
attention only on a few tokens. This distributive attention allows
LAMNER to capture more context of a source code. This can help
the model generate a more cohesive latent representation, useful
for generating informative code comments. Moreover, the attention
mechanism in LAMNER correctly focuses on the more prominent
tokens such as ‘max’ and ‘panel,’ which can help determine the
exact behavior of the given code samples. We observe that this
behavior is missing in the LAMNER
CodeBERT-Embeds
model – it
mostly focuses on other tokens near the start and end of the code
sample. We believe that this could be the reason why the code
comments generated by the LAMNER
CodeBERT-Embeds
model are
not as coherent as LAMNER.
6 HUMAN EVALUATION
Automatic metrics are extensively used in machine translation to
draw a quantitative comparison among the code summarization
models. The models with more overlapping tokens between the
references and predictions receive higher scores. However, there
can be issues with this evaluation, such as the texts can have the
same meaning without using common tokens; thus, despite being
a correct prediction, a semantically similar code summary without
any overlapping keyword would have zero score. This makes it
dicult to comprehend the eectiveness of dierent models [
65
,
70
].
Therefore, we further conducted a qualitative analysis. We ran-
domly select 100 generated summaries (for each model) along
with their original code, following similar approach to prior re-
search [
30
,
37
,
40
,
53
,
83
]. Amazon Mechanical Turk (MTurk) work-
ers were hired to rate the quality of the generated summaries, using
a rating system where 1 is the worst and 5 is the best score. The
MTurkers rated the summary voluntarily, and for each rated com-
ment, the MTurkers are given a compensation of one cent. We used
three common criteria to evaluate the summarization quality [
53
]:
Informativeness (I)
How well does the summary capture the key
points of the code?
Relevance (R)
Are the details provided in summary consistent
with details in the code?
Fluency (F)
Are the summaries well-written and grammatically
correct?
Two dierent workers were required to rate each summary be-
tween one and ve, where one is the worst and ve is the best
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition ICPC ’22, May 16–17, 2022, Virtual Event, USA
Table 3: (Eect of fusing LAMNER with other models. For simplicity of comparison, the LAMNERCodeBERT-Embeds and RENCOS
results from Table 2 are repeated here.
Model BLEU-1(%) BLEU-2(%) BLEU-3(%) BLEU-4(%) ROUGE-L(%) METEOR(%) CIDEr
LAMNERCodeBERT-Embeds 44.35 35.19 30.91 28.48 47.64 21.59 2.63
LAMNERCodeBERT-Embeds+LAMNER-Embeds 47.57 38.37 34.01 31.46 49.41 23.21 2.63
RENCOS 38.32 33.33 30.23 27.91 41.07 24.98 2.50
RENCOSLAMNER-Embeds 42.60 37.16 33.66 30.96 43.59 26.89 2.74
[
40
,
52
,
75
]. The MTurkers are shown an example with explana-
tions for all the criteria before starting to rate. We also ask the
MTurkers for their programming coding experience and if they
understand the generated summaries and code. To reduce the bias,
the name of the models and the reference comments are not shown
to the evaluators. Table 4 shows the average scores of the models
rated by the MTurkers on the four metrics. The predictions from
LAMNER are consistently rated better than RENCOS on all three
metrics.
Table 4: Results of human evaluation
Model Informative Relevance Fluency
LAMNER 4.13 4.18 4.13
RENCOS 4.07 4.07 4.06
7 THREATS TO VALIDITY
Internal Validity:
Internal threats in our work relate to the errors
in building our models and the replication of the baseline mod-
els, as these are the internal factors that might have impacted the
results. We have cross-checked the implementation of our model
for its correctness and will open-source the implementations for
easier replication of the results. For all the baseline models, we used
the ocial code provided by the authors. While training the base-
line models, if we encountered any errors or had any doubts, we
consulted the authors directly through raising GitHub issues. An
important diculty of training the models is preparing the datasets
in the required format. Various baseline models required separate
pre-processing such as generating AST, using third-party libraries
such as py-Lucene
3
, and extracting the API knowledge. For these
steps, we followed the instructions provided by the authors closely
to ensure the correctness of the required input.
Another threat to validity could be the dataset that we have
built for the NER model. We used the Javalang parser to label the
dataset. This parser reliably tags the Java methods and has been used
in several software engineering studies [
51
,
83
]. We have broken
down the identiers into four sub-categories using a rule-based
approach. We take careful steps in breaking down the identiers
– we randomly checked the sub-categories and did not nd any
misclassication.
External Validity:
In our study, threats to external validity re-
late to the generalizability of the results [
37
]. We used an external
and extensively used dataset in our work. We only applied our
3https://lucene.apache.org/pylucene/
model to the Java programming language. Although we hypothe-
size that the embeddings used in our work are benecial to other
programming languages, more studies are required.
Construct Validity:
In our work, the automatic evaluation met-
rics might aect the validity of the results. We used four dierent
automatic machine translation metrics (BLEU{1-4}, ROUGE-L, ME-
TEOR, and CIDEr) to reduce the bias of a particular metric. These
metrics are frequently used in natural language processing and
other related and similar studies in the software engineering do-
main [8, 27, 37, 72, 79, 83].
Conclusion Validity:
The conclusion validity refers to the re-
searchers’ bias or the bias found in the statistical analysis that can
lead to weak results [
10
]. The nature of this study does not depend
on the researchers’ bias. The conclusion threat can be related to
the reliability of the measurements obtained by the automatic met-
rics. To mitigate this threat, we used dierent evaluation metrics
(BLEU{1-4}, ROUGE-L, METEOR, and CIDEr) and applied the same
calculation to evaluate all the models. We note that the automatic
metrics cannot fully quantify the quality of the generated com-
ments. In mitigation, we conducted human studies to compare the
results from dierent models based on the developers’ perspectives.
A potential threat could be related to the conclusions obtained by
the human study. To reduce this threat, we anonymized the models,
and the reference comments were not shown to the evaluators.
Additionally, each generated comment is rated by three evaluators
for consistency purposes.
8 RELATED WORKS
This section summarizes the related works on code embedding and
code comment generation.
8.1 Code Embedding
The research on source code embedding is wide and has many
applications [
20
]. Hindle et al. [
34
] used n-grams to build a statisti-
cal language model and showed that programming languages are
like natural languages: they are repetitive and predictable. Bielik
et al. proposed a statistical model that applies to character-level
language modeling for program synthesis [
13
]. Recent embeddings
for code tokens are based on deep neural networks and are mostly
generated with the word2vec model [
57
] for C/C++, JavaScript,
and Java tokens. These embeddings are used for program repair
[
80
], software vulnerability detection [
33
], type prediction [
56
] and
bug detection [
63
]. Other embedding techniques from the natural
language like FastText [
15
] is used in another work to provide pre-
trained embeddings for six programming languages [
25
]. Wang et
ICPC ’22, May 16–17, 2022, Virtual Event, USA Rishab Sharma, Fuxiang Chen, and Fatemeh Fard
al. [
73
] proposed a technique for program representations from
a mixture of symbolic and concrete execution traces. A modied
Graph Neural Network called Graph Interval Neural Network is
used to provide a semantic representation of programs [
77
]. Kanade
et al. [
41
] trained BERT [
22
] to generate the contextual embeddings
and showed their eectiveness on variable misuse classication and
variable misuse localization and repair. Karampatsis and Sutton use
ELMO [
43
] for bug prediction. Alon et al. generate the embeddings
of Java methods using AST [9].
Lu et al. propose a new embedding in a hyperbolic space that uses
function call graphs [
55
]. Other works propose neural probabilistic
language model for code embedding [
4
], function embeddings for
repairing variable misuse in Python based on AST [
23
], embed-
ding of methods for code clone detection using AST node type
and node contents [
17
], embeddings based on tree or graph based
approaches [
5
,
6
], embeddings to learn representation of edits [
82
],
and embeddings for program repairs [
74
]. Chen at al. [
20
] provides a
comprehensive review of the embeddings for source code. The most
similar works are [
36
,
78
] and they use word-level encoding for code
comments. Although code embeddings are widely used in many ap-
plications, the existing code embeddings have the limitation of not
being able to detect code constructs such as those in camel case and
snake case succinctly, and the code token embeddings are not able
to capture the structural property in programming languages. To
tackle the limitation, our work proposes a novel Symentatic-Syntax
encoder-decoder model, LAMNER. Our experiments showed that
LAMNER is eective and has improved performance overall in the
baseline models.
8.2 Code Comment Generation
Software engineering researchers have proposed multiple tech-
niques to improve automatic code comment generation. Initial
eorts were made using the information retrieval, template-based,
and topic modeling approach. Haiduc et al. [
31
] used text retrieval
techniques such as Vector Space Model and Latent Semantic Index-
ing to generate code comments. A topic modeling approach was
followed by Eddy et al. [
24
] to draw a comparison between their
work and of the approach used in [
31
]. Moreno et al. [
58
] used a
template-based approach to generate the comments for methods
and classes automatically. Sridhara et al. [
69
] introduced Software
Word Usage Model (SWUM) to capture code tokens’ occurrences
to generate comments. Later, Iyer et al. [
40
] presented a neural net-
work for code comment generation. They were the rst to use an
attention-based LSTM neural translation model for comment gen-
eration. Hu et al. [
36
,
37
] introduced a model that uses AST. They
proposed a modied depth-rst search-based traversal algorithm,
namely Structure-Based Traversal, to atten the AST. Shahbazi et
al. [
66
] and Hu et al. [
38
] leveraged API available in the source
code to generate summaries. The former leveraged the text con-
tent of API’s whereas [
38
] used the API names in their respective
approaches. Alon et al. [
8
] consider all pairwise paths between
leaf nodes of AST and concatenate the representation of this path
with each leaf node’s token representation. LeClair et al. [
47
] pre-
sented a dual encoder model that combines the code sequence and
AST representation of code. Liang et al. [
49
] made changes to GRU
architecture to enable encapsulating the source code’s structural
information within itself. Wan et al. [
72
] employed actor-critic re-
inforcement learning and Tree-RNN to generate comments. Yao et
al. [
81
] modeled the relationship between the annotated code and
the retrieved code using a reinforcement learning framework and
used it to generate the natural language code annotations. Wei et
al. [
78
] proposed a dual framework that leverages dual training of
comment generation and code generation as an individual model.
Leclair et al. [
46
] improved the quality of generated comments by
employing a Graph Neural Network model with AST and source
code. A recent approach combines the techniques available in in-
formation retrieval to train an NMT model [83]. Two similar code
snippets are retrieved from the test data and used as input along
with the test sequence during testing. Similarly, Wei et al. [
79
] in-
put a similar code snippet, AST, code sequence, and an exemplar
comment to generate better comments. In another work, Li et al.
[
48
] leveraged a retrieval-based technique to generate the correct
keywords within the code comments. It rst creates a summary
template – a similar summary retrieved from the training corpus
and modied to keep only the important keyword related to the
code. This template summary provides a repetitive structure of the
code comment which can be edited to replace important keywords
from the code.
Liu et al. [
54
] and Panthaplackel et al. [
60
] proposed a com-
ment update technique that learns from code-comment changes
and generates new comments. Wang et al. [
76
] use code token,
AST, intra-class context from the class name, and Unied Model-
ing Language diagrams to generate comments. Haque et al. [
32
]
use full code context le to generate comments for methods. More
recently, researchers have also become interested in employing a
pretrained language model [
41
]. Feng et al. [
27
] trained a multilin-
gual transformer-based language model on six languages and tested
the model for code comment generation. The previous research used
dierent techniques to represent code. This work introduced a novel
technique to capture the semantic-syntax information that is inher-
ently important in the programming language.
9 CONCLUSION
This paper presents a novel code comment generation model, LAM-
NER, which uses semantic-syntax embeddings that encodes a code
token’s semantic and syntactic structure. LAMNER can be combined
with other code comment generation models to improve model per-
formance. The evaluation on BLEU{1-4}, ROUGE-L, METEOR, and
CIDER metrics conrm that LAMNER achieves state-of-the-art
performance. We relate this result to the pre-trained embeddings
introduced and their ability to extract unseen code sequences’ se-
mantic and syntactic representation. This result is also supported
through human evaluation. The human evaluation also suggests
that the comments from LAMNER are uent and are grammatically
correct. Several studies conducted show the importance of both
the embeddings for comment generation. In the future, we plan to
apply LAMNER to other programming languages and on dierent
tasks such as bug prediction.
ACKNOWLEDGMENTS
This research is support by a grant from Natural Sciences and
Engineering Research Council of Canada RGPIN-2019-05175.
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition ICPC ’22, May 16–17, 2022, Virtual Event, USA
REFERENCES
[1]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang.
2020. A Transformer-based Approach for Source Code Summarization. In ACL.
4998–5007.
[2]
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and
Roland Vollgraf. 2019. FLAIR: An Easy-to-Use Framework for State-of-the-Art
NLP. In Proceedings of the 2019 Conference of the North American Chapter of
the Association for Computational Linguistics (Demonstrations). Association for
Computational Linguistics, Minneapolis, Minnesota, 54–59.
[3]
Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual String Embed-
dings for Sequence Labeling. In Proceedings of the 27th International Conference
on Computational Linguistics. Association for Computational Linguistics, Santa
Fe, New Mexico, USA, 1638–1649.
[4]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Sug-
gesting Accurate Method and Class Names (ESEC/FSE 2015). Association for
Computing Machinery, New York, NY, USA, 38–49.
[5]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning
to Represent Programs with Graphs. In International Conference on Learning
Representations.
[6]
Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli, and Charles
Sutton. 2017. Learning Continuous Semantic Representations of Symbolic Ex-
pressions. In Proceedings of the 34th International Conference on Machine Learning
(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye
Teh (Eds.). PMLR, International Convention Centre, Sydney, Australia, 80–88.
[7]
Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-
Supervised Bug Detection and Repair. In NeurIPS 2021.
[8]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating
Sequences from Structured Representations of Code. In 7th International Con-
ference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9,
2019. OpenReview.net.
[9]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2vec: Learn-
ing Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL,
Article 40 (Jan. 2019), 29 pages.
[10]
Apostolos Ampatzoglou, Stamatia Bibi, Paris Avgeriou, Marijn Verbeek, and
Alexander Chatzigeorgiou. 2019. Identifying, categorizing and mitigating threats
to validity in software engineering secondary studies. Information and Software
Technology 106 (2019), 201–230.
[11]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine
Translation by Jointly Learning to Align and Translate. In 3rd International
Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9,
2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
[12]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT
Evaluation with Improved Correlation with Human Judgments. In Proceedings
of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine
Translation and/or Summarization. Association for Computational Linguistics,
Ann Arbor, Michigan, 65–72.
[13]
Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2017. Program Synthesis for
Character Level Language Modeling. In 5th International Conference on Learning
Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track
Proceedings. OpenReview.net.
[14]
Dave Binkley, Marcia Davis, Dawn Lawrie, and Christopher Morrell. 2009. To
camelcase or under_score. In 2009 IEEE 17th International Conference on Program
Comprehension. 158–167.
[15]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. En-
riching Word Vectors with Subword Information. Transactions of the Association
for Computational Linguistics 5 (2017), 135–146.
[16]
Luca Buratti, Saurabh Pujar, Mihaela Bornea, Scott McCarley, Yunhui Zheng,
Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang,
and Giacomo Domeniconi. 2020. Exploring Software Naturalness through
Neural Language Models. arXiv e-prints, Article arXiv:2006.12641 (June 2020),
arXiv:2006.12641 pages. arXiv:2006.12641
[17]
Lutz Büch and Artur Andrzejak. 2019. Learning-Based Recursive Aggregation of
Abstract Syntax Trees for Code Clone Detection. In 2019 IEEE 26th International
Conference on Software Analysis, Evolution and Reengineering (SANER). 95–104.
[18]
Casey Casalnuovo, Kenji Sagae, and Premkumar T. Devanbu. 2019. Studying the
Dierence Between Natural and Programming Language Corpora. Empirical
Software Engineering (2019).
[19]
Roee Cates, Nadav Yunik, and Dror G Feitelson. 2021. Does Code Structure
Aect Comprehension? On Using and Naming Intermediate Variables. In 2021
IEEE/ACM 29th International Conference on Program Comprehension (ICPC).
[20]
Zimin Chen and Martin Monperrus. 2019. A Literature Study of Embed-
dings on Source Code. arXiv e-prints, Article arXiv:1904.03061 (April 2019),
arXiv:1904.03061 pages. arXiv:1904.03061 [cs.LG]
[21]
Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014.
Empirical evaluation of gated recurrent neural networks on sequence modeling.
In NIPS 2014 Workshop on Deep Learning, December 2014.
[22]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
Proceedings of the 2019 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies, Volume 1 (Long and
Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota,
4171–4186.
[23]
Jacob Devlin, Jonathan Uesato, Rishabh Singh, and Pushmeet Kohli. 2017. Se-
mantic Code Repair using Neuro-Symbolic Transformation Networks. CoRR
abs/1710.11054 (2017). arXiv:1710.11054
[24]
Brian P. Eddy, Jerey A. Robinson, Nicholas A. Kraft, and Jerey C. Carver. 2013.
Evaluating source code summarization techniques: Replication and expansion.
In 2013 21st International Conference on Program Comprehension (ICPC). 13–22.
[25]
Vasiliki Efstathiou and Diomidis Spinellis. 2019. Semantic Source Code Models
Using Identier Embeddings. In Proceedings of the 16th International Conference
on Mining Software Repositories (Montreal, Quebec, Canada) (MSR ’19). IEEE Press,
29–33.
[26]
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-
Sequence Attentional Neural Machine Translation. In Proceedings of the 54th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
Papers). Association for Computational Linguistics, Berlin, Germany, 823–833.
[27]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong,
Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT:
A Pre-Trained Model for Programming and Natural Languages. In Findings of the
Association for Computational Linguistics: EMNLP 2020. Association for Computa-
tional Linguistics, Online, 1536–1547.
[28]
Beat Fluri, Michael Würsch, Emanuel Giger, and Harald C. Gall. 2009. Analyzing
the co-evolution of comments and source code. Software Quality Journal, 367–
394.
[29]
David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2020. Code to
Comment “Translation”: Data, Metrics, Baselining & Evaluation. In 2020 35th
IEEE/ACM International Conference on Automated Software Engineering (ASE).
IEEE, 746–757.
[30]
Max Grusky, Mor Naaman, and Yoav Artzi. 2018. Newsroom: A Dataset of 1.3
Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long Papers) (June 2018),
708–719.
[31]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the
Use of Automated Text Summarization Techniques for Summarizing Source Code
(WCRE ’10). IEEE Computer Society, USA, 35–44.
[32]
Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Im-
proved Automatic Summarization of Subroutines via Attention to File Context.
In Proceedings of the 17th International Conference on Mining Software Repositories
(Seoul, Republic of Korea) (MSR ’20). Association for Computing Machinery, New
York, NY, USA, 300–310.
[33]
Jacob A. Harer, Louis Y. Kim, Rebecca L. Russell, Onur Ozdemir, Leonard R.
Kosta, Akshay Rangamani, Lei H. Hamilton, Gabriel I. Centeno, Jonathan R. Key,
Paul M. Ellingwood, Erik Antelman, Alan Mackay, Marc W. McConley, Jerey M.
Opper, Peter Chin, and Tomo Lazovich. 2018. Automated software vulnerability
detection with machine learning. arXiv e-prints, Article arXiv:1803.04497 (feb
2018), arXiv:1803.04497 pages. arXiv:1803.04497 [cs.SE]
[34]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu.
2012. On the Naturalness of Software (ICSE ’12). IEEE Press, 837–847.
[35]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.
Neural Comput. 9, 8 (Nov. 1997), 1735–1780.
[36]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep Code Comment
Generation. In Proceedings of the 26th Conference on Program Comprehension
(Gothenburg, Sweden) (ICPC ’18). Association for Computing Machinery, New
York, NY, USA, 200–210.
[37]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2020. Deep code comment
generation with hybrid lexical and syntactical information. Empirical Software
Engineering 25 (05 2020), 2179–2217.
[38]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing
Source Code with Transferred API Knowledge. In Proceedings of the Twenty-
Seventh International Joint Conference on Articial Intelligence, IJCAI-18. Interna-
tional Joint Conferences on Articial Intelligence Organization, 2269–2275.
[39]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for
Sequence Tagging. CoRR abs/1508.01991 (2015). arXiv:1508.01991
[40]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016.
Summarizing Source Code using a Neural Attention Model. In Proceedings of
the 54th Annual Meeting of the Association for Computational Linguistics (Volume
1: Long Papers). Association for Computational Linguistics, Berlin, Germany,
2073–2083.
[41]
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020.
Learning and Evaluating Contextual Embedding of Source Code. In Proceedings
of the 37th International Conference on Machine Learning (Proceedings of Machine
Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, Virtual,
5110–5121.
[42]
Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and
Andrea Janes. 2020. Big Code != Big Vocabulary: Open-Vocabulary Models for
ICPC ’22, May 16–17, 2022, Virtual Event, USA Rishab Sharma, Fuxiang Chen, and Fatemeh Fard
Source Code (ICSE ’20). Association for Computing Machinery, New York, NY,
USA, 1073–1085.
[43]
Rafael-Michael Karampatsis and Charles Sutton. 2020. SCELMo: Source Code
Embeddings from Language Models. arXiv e-prints, Article arXiv:2004.13214
(April 2020), arXiv:2004.13214 pages. arXiv:2004.13214
[44]
Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M.
Rush. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Translation. In
Proceedings of ACL 2017, System Demonstrations.
[45]
Triet H. M. Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep Learning for
Source Code Modeling and Generation: Models, Applications, and Challenges.
ACM Comput. Surv. 53, 3, Article 62 (June 2020), 38 pages.
[46]
Alexander LeClair,Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved
Code Summarization via a Graph Neural Network. In Proceedings of the 28th
International Conference on Program Comprehension.
[47]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A Neural Model
for Generating Natural Language Summaries of Program Subroutines (ICSE ’19).
IEEE Press, 795–806.
[48]
Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, and Zhi Jin. 2021. EDITSUM:
A Retrieve-and-Edit Framework for Source Code Summarization. In 2021 36th
IEEE/ACM International Conference on Automated Software Engineering (ASE).
[49]
Yuding Liang and Kenny Qili Zhu. 2018. Automatic Generation of Text Descriptive
Comments for Code Blocks. In Proceedings of the Thirty-Second AAAI Conference
on Articial Intelligence, (AAAI-18), the 30th innovative Applications of Articial
Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in
Articial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018,
Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5229–5236.
[50]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries.
In Text Summarization Branches Out. Association for Computational Linguistics,
Barcelona, Spain, 74–81.
[51]
Fang Liu, Ge Li, Yunfei Zhao, and Zhi Jin. 2020. Multi-task Learning based
Pre-trained Language Model for Code Completion. In 2020 35th IEEE/ACM Inter-
national Conference on Automated Software Engineering (ASE). 473–485.
[52]
Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, and Yang Liu. 2021. Retrieval-
Augmented Generation for Code Summarization via Hybrid {GNN}. In Interna-
tional Conference on Learning Representations.
[53]
Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders.
In Proceedings of the 2019 Conference on Empirical Methods in Natural Language
Processing and the 9th International Joint Conference on Natural Language Pro-
cessing (EMNLP-IJCNLP).
[54]
Zhongxin Liu, Xin Xia, Meng Yan, and Shanping Li. 2020. Automating Just-In-
Time Comment Updating. In 2020 35th IEEE/ACM International Conference on
Automated Software Engineering (ASE). IEEE, 585–597.
[55]
Mingming Lu, Yan Liu, Haifeng Li, Dingwu Tan, Xiaoxian He, Wenjie Bi, and
Wendbo Li. 2019. Hyperbolic Function Embedding: Learning Hierarchical Rep-
resentation for Functions of Source Code in Hyperbolic Space. Symmetry 11, 2
(2019), 254.
[56]
Rabee S. Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: Inferring
JavaScript Function Types from Natural Language Information. In 2019 IEEE/ACM
41st International Conference on Software Engineering (ICSE). 304–315.
[57]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean. 2013.
Distributed representations of words and phrases and their compositionality.
Advances in neural information processing systems 26 (2013), 3111–3119.
[58]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock,
K. Vijay-Shanker, and K. Vijay-Shanker. 2013. Automatic generation of natural
language summaries for Java classes. In 2013 21st International Conference on
Program Comprehension (ICPC). 23–32.
[59]
Christian D Newman, Reem S AlSuhaibani, Michael J Decker, Anthony Peruma,
Dishant Kaushik, Mohamed Wiem Mkaouer, and Emily Hill. 2020. On the gener-
ation, structure, and semantics of grammar patterns in source code identiers.
Journal of Systems and Software 170 (2020), 110740.
[60]
Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi J. Li, and Raymond J
Mooney. 2020. Learning to Update Natural Language Comments Based on
Code Changes. In Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics.
[61]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a
Method for Automatic Evaluation of Machine Translation. In Proceedings of the
40th Annual Meeting of the Association for Computational Linguistics. Association
for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318.
[62]
Jerey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe:
Global Vectors for Word Representation. In Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP). Association for
Computational Linguistics, Doha, Qatar, 1532–1543.
[63]
Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to
Name-Based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147
(Oct. 2018), 25 pages.
[64]
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sid-
ney D’Mello. 2014. Improving Automated Source Code Summarization via an
Eye-Tracking Study of Programmers (ICSE 2014). Association for Computing
Machinery, New York, NY, USA, 390–401.
[65]
Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning
Robust Metrics for Text Generation. In Proceedings of the 58th Annual Meeting
of the Association for Computational Linguistics. Association for Computational
Linguistics, Online, 7881–7892.
[66]
Ramin Shahbazi, Rishab Sharma, and Fatemeh H. Fard. 2021. API2Com: On the Im-
provement of Automatically Generated Code Comments Using API Documenta-
tions. In 2021 IEEE/ACM 29th International Conference on Program Comprehension
(ICPC).
[67]
Alex Sherstinsky. 2020. Fundamentals of Recurrent Neural Network (RNN) and
Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena
404 (2020), 132306.
[68]
Janet Siegmund. 2016. Program Comprehension: Past, Present, and Future. In
2016 IEEE 23rd International Conference on Software Analysis, Evolution, and
Reengineering (SANER), Vol. 5. 13–20.
[69]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay-
Shanker. 2010. Towards Automatically Generating Summary Comments for Java
Methods. In Proceedings of the IEEE/ACM International Conference on Automated
Software Engineering (Antwerp, Belgium) (ASE ’10). Association for Computing
Machinery, New York, NY, USA, 43–52.
[70]
Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley
Weimer, Kevin Leach, and Yu Huang. 2020. A Human Study of Comprehension
and Code Summarization. In Proceedings of the 28th International Conference on
Program Comprehension (Seoul, Republic of Korea) (ICPC ’20). Association for
Computing Machinery, New York, NY, USA, 2–13.
[71]
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. CIDEr:
Consensus-based image description evaluation. In 2015 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR). 4566–4575.
[72]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and
Philip S. Yu. 2018. Improving Automatic Source Code Summarization via Deep
Reinforcement Learning. In Proceedings of the 33rd ACM/IEEE International Con-
ference on Automated Software Engineering (Montpellier, France) (ASE 2018).
Association for Computing Machinery, New York, NY, USA, 397–407.
[73]
Ke Wang and Zhendong Su. 2020. Blended, Precise Semantic Program Embed-
dings (PLDI 2020). Association for Computing Machinery, New York, NY, USA,
121–134.
[74]
Ke Wang, Zhendong Su, and Rishabh Singh. 2018. Dynamic Neural Program
Embeddings for Program Repair. In International Conference on Learning Repre-
sentations.
[75]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip
Yu, and Guandong Xu. 2020. Reinforcement-Learning-Guided Source Code Sum-
marization via Hierarchical Attention. IEEE Transactions on Software Engineering
(2020), 1–1.
[76]
Yanlin Wang, Lun Du, Ensheng Shi, Yuxuan Hu, Shi Han, and Dongmei Zhang.
2020. CoCoGUM: Contextual Code Summarization with Multi-Relational GNN on
UMLs. Technical Report MSR-TR-2020-16. Microsoft.
[77]
Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning Semantic
Program Embeddings with Graph Interval Neural Network. Proc. ACM Program.
Lang. 4, OOPSLA, Article 137 (Nov. 2020), 27 pages.
[78]
Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, and Zhi Jin. 2019. Code Generation as a
Dual Task of Code Summarization. In Advances in Neural Information Processing
Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d
'
Alché-Buc, E. Fox, and
R. Garnett (Eds.). Curran Associates, Inc., 6563–6573.
[79]
Bolin Wei, Yongmin Li, Ge Li, Xin Xia, and Zhi Jin. 2020. Retrieve and Rene:
Exemplar-based Neural Comment Generation. In 2020 35th IEEE/ACM Interna-
tional Conference on Automated Software Engineering (ASE).
[80]
Martin White, Michele Tufano, Matías Martínez, Martin Monperrus, and Denys
Poshyvanyk. 2019. Sorting and Transforming Program Repair Ingredients via
Deep Learning Code Similarities. In 2019 IEEE 26th International Conference on
Software Analysis, Evolution and Reengineering (SANER). 479–490.
[81]
Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code
Annotation for Code Retrieval with Reinforcement Learning. In The World Wide
Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu,
Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo
Baeza-Yates, and Leila Zia (Eds.). ACM, 2203–2214.
[82]
Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, and
Alexander L. Gaunt. 2019. Learning to Represent Edits. In International Conference
on Learning Representations.
[83]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020.
Retrieval-Based Neural Source Code Summarization. In Proceedings of the
ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South
Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA,
1385–1397.