A preview of this full-text is provided by Springer Nature.
Content available from Empirical Software Engineering
This content is subject to copyright. Terms and conditions apply.
(2023 ) 28:87
Empirical Software Engineering
https://doi.org/10.1007/s10664-023-10319-6
A graph-based code representation method to improve code
readability classification
Qing Mi1·Yi Zhan1·Han Weng1·Qinghang Bao1·Longjie Cui1·Wei Ma1
Accepted: 13 March 2023 / Published online: 23 May 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
Context Code readability is crucial for developers since it is closely related to code main-
tenance and affects developers’ work efficiency. Code readability classification refers to the
source code being classified as pre-defined certain levels according to its readability. So far,
many code readability classification models have been proposed in existing studies, including
deep learning networks that have achieved relatively high accuracy and good performance.
Objective However, in terms of representation, these methods lack effective preservation of
the syntactic and semantic structure of the source code. To extract these features, we propose
a graph-based code representation method.
Method Firstly, the source code is parsed into a graph containing its abstract syntax tree
(AST) combined with control and data flow edges to reserve the semantic structural infor-
mation and then we convert the graph nodes’ source code and type information into vectors.
Finally, we train our graph neural networks model composing Graph Convolutional Net-
work (GCN), DMoNPooling, and K-dimensional Graph Neural Networks (k-GNNs) layers
to extract these features from the program graph.
Result We evaluate our approach to the task of code readability classification using a Java
dataset provided by Scalabrino et al. (2016). The results show that our method achieves
72.5% and 88% in three-class and two-class classification accuracy, respectively.
Conclusion We are the first to introduce graph-based representation into code readability clas-
sification. Our method outperforms state-of-the-art readability models, which suggests that
the graph-based code representation method is effective in extracting syntactic and semantic
information from source code, and ultimately improves code readability classification.
Keywords Code readability classification ·Graph neural network ·Code representation ·
Abstract syntax tree ·Program comprehension
Communicated by: Simone Scalabrino, Rocco Oliveto, Felipe Ebert, Fernanda Madeiral, Fernando Castor
This article belongs to the Topical Collection: Code Legibility, Readability and Understandability.
BWei M a
mawei@bjut.edu.cn
1Faculty of Information Technology, Beijing University of Technology, Beijing, China
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.