Xiangping Chen’s research while affiliated with Sun Yat-sen University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (79)


Characterizing Smart Contract Evolution
  • Article

February 2025

·

16 Reads

ACM Transactions on Software Engineering and Methodology

Xiangping Chen

·

Ziang Qian

·

Peiyong Liao

·

[...]

·

Smart contracts are programs that permanently store and automatically execute on the blockchain system such as Ethereum. Due to the non-tamperable nature of the underlying blockchain, smart contracts are difficult to update once deployed, which requires redeploying the contracts and migrating the data. It means that the observation of smart contract evolution in the real world makes more sense. Hence, in this paper, we conducted the first large-scale empirical study to characterize the evolution of smart contracts in Ethereum. For evolution identification, we presented a contract similarity-based search algorithm, digEvolution, and evaluated its effectiveness with five different search strategies. Then we applied this algorithm to 80,152 on-chain contracts we collected from Ethereum, to dig out the evolution among these contracts. We then explored three research questions. We first studied whether the evolution of smart contracts is common (RQ1), then we studied how do the Gas consumption (RQ2) and the vulnerability (RQ3) of smart contracts vary during the evolution. Our research results show that the evolution of smart contracts is not very common. There are some contract components that have vulnerability but still be called by users. The Gas consumption of most smart contracts doesn’t vary during the evolution, contract is Gas-efficient before and after the evolution. The vulnerability of most smart contracts doesn’t vary during the evolution, both are secure before and after the evolution.


Towards Improving the Performance of Comment Generation Models by Using Bytecode Information

January 2025

·

6 Reads

IEEE Transactions on Software Engineering

Code comment plays an important role in program understanding, and a large number of automatic comment generation methods have been proposed in recent years. To get a better effect of generating comments, many studies try to extract a variety of information (e.g., code tokens, AST traverse sequence, APIs call sequence) from source code as model input. In this study, we found that the bytecode compiled from the source code can provide useful information for comment generation, hence we propose to use the information from bytecode to assist the comment generation. Specifically, we extract the control flow graph (CFG) from the bytecode and propose a serialization method to obtain the CFG sequence that preserves the program structure. Then, we discuss three methods for introducing bytecode information for different models. We collected 390,000 Java methods from the maven repository, and created a dataset of 101,124 samples after deduplication and preprocessing to evaluate our method. The results show that introducing the information extracted from the bytecode can improve the BLEU- 4 of 7 comment generation models.


TG-CUP: A Transformer and GNN Based Multi-Modal Comment Updating Method

December 2024

·

10 Reads

ACM Transactions on Software Engineering and Methodology

Comments play a crucial role in code comprehension and maintenance. This is particularly vital when the code is changed, as comments should be promptly updated to maintain consistency between the code and the comments. Existing comment update methods usually treat code as natural language text, ignore the information of code structure, and often fail when code changes are not associated with comment updates (called a non-code-indicative update, i.e., NCIU). Therefore, we propose a Transformer and graph neural network based comment update method (TG-CUP). The model integrates the information of old comments, code edit sequences, and AST-Difference Graph to update outdated comments. The experimental results show that TG-CUP increased by 5.16% and 2.23% compared with the most advanced methods on Accuracy and Recall@5, and the performance on NCIUs is improved as well.





Ethereum Transaction Replay Platform Based on State-Wise Account Input Data

September 2024

·

26 Reads

IEEE Transactions on Services Computing

An increasing number of investors are active on Ethereum, resulting in numerous transactions. These historical transactions can be applied to complete contract testing. For example, it can be used for gas optimization or contract repair to verify that improved contracts meet expectations. Most existing methods deploy private chains to use non-real transactions for contract verification instead of actual historical transactions on the Ethereum mainnet. The challenge of using actual historical transactions for verification is that Ethereum only records the latest state of the account and cannot restore the execution of historical transactions. Due to contract code changes in contract gas optimization, contract defect repair, and other scenarios, we need to test the execution of the contract code before and after the change. However, existing tools cannot customize and modify historical transactions for testing purposes. Therefore, we propose an efficient transaction replay platform, EthReplayer , which can not only replay the historical transactions of Ethereum quickly and faithfully but also realize the modification of transactions to achieve the purpose of testing with actual transactions. Experimental results show that our replay speed is 1.5 times the fastest available, and it only takes 29,594 seconds to replay 1,200 million blocks. In addition, it is applied to contract repair verification, gas optimization verification, and gas estimation, and the results prove the effectiveness of EthReplayer .


Are your comments outdated? Toward automatically detecting code‐comment consistency

August 2024

·

7 Reads

·

3 Citations

Journal of Software: Evolution and Process

In software development and maintenance, code comments can help developers understand source code and improve communication among developers. However, developers sometimes neglect to update the corresponding comment when changing the code, resulting in outdated comments (i.e., inconsistent codes and comments). Outdated comments are dangerous and harmful and may mislead subsequent developers. More seriously, the outdated comments may lead to a fatal flaw sometime in the future. To automatically identify the outdated comments in source code, we proposed a learning‐based method, called CoCC, to detect the consistency between code and comment. To efficiently identify outdated comments, we extract multiple features from both codes and comments before and after they change. Besides, we also consider the relation between code and comment in our model. Experiment results show that CoCC can effectively detect outdated comments with precision over 90%. In addition, we have identified the 15 most important factors that cause outdated comments and verified the applicability of CoCC in different programming languages. We also used CoCC to find outdated comments in the latest commits of open source projects, which further proves the effectiveness of the proposed method.


JIT-Smart: A Multi-task Learning Framework for Just-in-Time Defect Prediction and Localization

July 2024

·

41 Reads

·

1 Citation

Just-in-time defect prediction (JIT-DP) is used to predict the defect-proneness of a commit and just-in-time defect localization (JIT-DL) is used to locate the exact buggy positions (defective lines) in a commit. Recently, various JIT-DP and JIT-DL techniques have been proposed, while most of them use a post-mortem way (e.g., code entropy, attention weight, LIME) to achieve the JIT-DL goal based on the prediction results in JIT-DP. These methods do not utilize the label information of the defective code lines during model building. In this paper, we propose a unified model JIT-Smart, which makes the training process of just-in-time defect prediction and localization tasks a mutually reinforcing multi-task learning process. Specifically, we design a novel defect localization network (DLN), which explicitly introduces the label information of defective code lines for supervised learning in JIT-DL with considering the class imbalance issue. To further investigate the accuracy and cost-effectiveness of JIT-Smart, we compare JIT-Smart with 7 state-of-the-art baselines under 5 commit-level and 5 line-level evaluation metrics in JIT-DP and JIT-DL. The results demonstrate that JIT-Smart is statistically better than all the state-of-the-art baselines in JIT-DP and JIT-DL. In JIT-DP, at the median value, JIT-Smart achieves F1-Score of 0.475, AUC of 0.886, Recall@20%Effort of 0.823, Effort@20%Recall of 0.01 and Popt of 0.942 and improves the baselines by 19.89%-702.74%, 1.23%-31.34%, 9.44%-33.16%, 21.6%-53.82% and 1.94%-34.89%, respectively . In JIT-DL, at the median value, JIT-Smart achieves Top-5 Accuracy of 0.539 and Top-10 Accuracy of 0.396, Recall@20%Effort line of 0.726, Effort@20%Recall line of 0.087 and IFA line of 0.098 and improves the baselines by 101.83%-178.35%, 101.01%-277.31%, 257.88%-404.63%, 71.91%-74.31% and 99.11%-99.41%, respectively. Statistical analysis shows that our JIT-Smart performs more stably than the best-performing model. Besides, JIT-Smart also achieves the best performance compared with the state-of-the-art baselines in cross-project evaluation.


Are the smart contracts on Q&A site reliable?

June 2024

·

1 Read

Software Practice and Experience

Ethereum, as a leading blockchain platform, has attracted a significant number of practitioners. These practitioners require a platform for communication and collaborative problem‐solving, which led to Ethereum Stack Exchange (ESE), a Q&A site dedicated to Ethereum‐related issues. While the Q&A site facilitates communication among practitioners, it also introduces new challenges. Practitioners adopt code snippets from Q&A sites to address problems encountered. However, the quality of code snippets on ESE remains largely unexplored. Vulnerabilities and gas‐inefficient patterns in ESE may spread to the code in Ethereum and threaten its regular operation. In this article, we conduct an empirical study investigating the distribution of vulnerabilities and gas‐inefficient patterns in ESE. Further, we analyze the potential impact of vulnerabilities and gas‐inefficient patterns from ESE on Ethereum. However, we encounter a problem during the vulnerability and gas‐inefficient pattern detection. Established smart contract analysis tools in the mainstream realm necessitate complete source code files for thorough analysis, while codes on ESE are often incomplete code snippets. To address this, we introduce the AST‐based code clone detection technique to construct detectable files corresponding to code snippets. This enables us to detect vulnerabilities and gas‐inefficient patterns in code snippets. In the end, our findings demonstrate that 11.18% of the contract‐level code snippets and 4.06% of function‐level code snippets in ESE have vulnerabilities. And 27.21% of contract‐level code snippets and 17.89% of function‐level code snippets contain gas‐inefficient patterns. The additional consumption caused by the gas‐inefficient pattern in ESE is approximately $1,695,002. Based on these findings, we provide recommendations for both ESE and its users, aiming to foster collaborative efforts and create a more reliable Q&A site for practitioners.


Citations (47)


... Interestingly, even in extreme cases, such as in the MBPP dataset, at 100% compression (i.e., DocString is completely removed), the models are still able to generate a certain percentage of correct code. This may indicate that in addition to DocString, other elements in the function signature, such as method name, carry important semantic information that is sufficient to guide the model in generating code that functional correct [9,51]. This phenomenon is explored further in Section 6, as it relates to the deeper mechanisms of how the model utilizes different types of cue information to generate code. ...

Reference:

Less is More: DocString Compression in Code Generation
Do Code Summarization Models Process Too Much Information? Function Signature May Be All What Is Needed
  • Citing Article
  • March 2024

ACM Transactions on Software Engineering and Methodology

... White et al. [55] proposed TCtracer, which automatically establishes traceability links between production and test code at the method and class levels. Huang et al. [22] introduced a new approach that gauges the likelihood of co-evolution through extracted code changes, code complexity, and certain semantic features. ...

Towards automatically identifying the co‐change of production and test code
  • Citing Article
  • January 2024

Software Testing Verification and Reliability

... Wattanakriengkrai et al. [11] used LIME technology to identify risk markers, enabling file level models to predict defect lines in code. Zhu et al. [12] combined the BiLSTM model with extended code syntax information to predict row level defects. The DeepLineDP model proposed by Pornprasit et al. [13] utilized a bidirectional gated recurrent unit (Bi-GRU) network and attention mechanism to estimate the defect probability of code lines from defective files. ...

SyntaxLineDP: a Line-level Software Defect Prediction Model based on Extended Syntax Information
  • Citing Conference Paper
  • October 2023

... Learning-based patch generation techniques [9,[44][45][46][47][48] typically view program repair as a Neural Machine Translation (NMT) task, training deep learning models to capture bug context and generate patches for defective programs, converting defective programs to stationary programs. CURE [45] pre-trains an NMT model on a large corpus of developer code and uses a static checking strategy to generate patches with valid identifiers which improves syntactic correctness. ...

Smart Contract Code Repair Recommendation based on Reinforcement Learning and Multi-metric Optimization
  • Citing Article
  • December 2023

ACM Transactions on Software Engineering and Methodology

... Empirical research on software documentation quality is an active field that focuses on various artifacts, like API reference documentation [40] or README files [68], and the perspectives of documentation writers [1]. Studies on the evaluation of AI-generated documentation usually focus on automated metrics like BLEU, ROUGE, and METEOR [29], [64], [69]- [71]. Hu et al. compared such automated metrics with human evaluations on six documentation quality dimensions [19] and found that automated metrics often misalign with human judgment [19]. ...

Snippet Comment Generation Based on Code Context Expansion
  • Citing Article
  • Full-text available
  • July 2023

ACM Transactions on Software Engineering and Methodology

... Prior work has created a dataset by tracking GitHub smart contract projects that have vulnerability fix commits [15]. In contrast, we aim to construct a vulnerability lifecycle dataset based on deployed smart contracts available on Etherscan. ...

An empirical study on real bug fixes from solidity smart contract projects
  • Citing Article
  • June 2023

Journal of Systems and Software

... Two APR tools for smart contracts use supervised learning: SmartRep [52] employs a double encoder to abstract the vulnerable method from the AST, and train a system from one-line fixes of GitHub commits. RLRep [17] applies a reinforcement learning approach with policy gradient, using a reward function that evaluates fixes based on compilation success, vulnerability detection, code entropy [33], and code similarity. ...

Security Code Recommendations for Smart Contract
  • Citing Conference Paper
  • March 2023

... Few-shot learning and generative code formats have made it conceivable for models to form and alter code formats from few cases. This app works best with template-based advancement devices since it lets labourers set up ventures rapidly and compose less tedious [20]. The related work in deep learning for automatic code creation shows how these technologies have the power to completely change software development in many ways, from making code safer and better written to making developers more productive and opening up new ways of programming. ...

A Comparative Study on Method Comment and Inline Comment
  • Citing Article
  • February 2023

ACM Transactions on Software Engineering and Methodology

... e resource preview, as a promising solution to this problem, has become an area of intense investigation in recent years. Existing methods have been able to preview the important content of traditional resources, such as the previews of images [2][3][4][5][6], documents [7][8][9][10][11][12], and video resources [13][14][15][16][17][18][19][20][21]. But for interactive educational resources (IERs), there is still a lack of previews of dynamic content and interactive processes in the resources. ...

Generating Summarized Preview for Education Resource based on Exploring and Comparing GUIs
  • Citing Conference Paper
  • July 2016

... Recently, researchers have investigated of generating code comments from bytecode for popular programming languages, such as Java. For example, Huang et al. [32] first proposed a method named BCGen to generate comments for Java bytecode in 2022. Similarly, they converted the bytecode into CFGs and built the neural language model to learn from the CFGs and token sequences. ...

BCGen: a comment generation method for bytecode

Automated Software Engineering