June 2021
·
39 Reads
·
21 Citations
Information and Software Technology
Automatic software fault localization serves as a significant role in helping developers find bugs efficiently. Existing approaches can be categorized into static methods and dynamic ones, which have improved the fault locating greatly by analyzing static features from the source code or tracking dynamic behaviors during the runtime respectively. However, the localization accuracy is still far from satisfactory for users. To enhance the capability of detecting software faults with the statement granularity, this paper proposes ALBFL, a novel neural ranking model, combining the static and dynamic features could obtain excellent fault localization accuracy. Firstly, ALBFL learns the semantic features of the source code by a transformer encoder. Then, it integrates them with other static features and dynamic features, i.e., statistical features, Spectrum-Based Fault Localization (SBFL) features, and mutation features, through a self-attention layer. Next, in order to evaluate the faulty possibility of each software statement, the integration results output by self-attention layer are fed into the LambdaRank model, which ranks the suspicious statements in descending order. Finally, we test our model on the authoritative dataset–defect4J, in which consists of 5 open-source projects and a total of 357 faulty programs. It shows that the defect statements identified by ALBFL are three times more than 11 traditional SBFL methods, and outperform two state-of-the-art approaches by more than 14% in [email protected] Context: Automatic software fault localization serves as a significant purpose in helping developers solve bugs efficiently. Existing approaches for software fault localization can be categorized into static methods and dynamic ones, which have improved the fault locating ability greatly by analyzing static features from the source code or tracking dynamic behaviors during the runtime respectively. However, the accuracy of fault localization is still unsatisfactory. Objective: To enhance the capability of detecting software faults with the statement granularity, this paper puts forward ALBFL, a novel neural ranking model that combines the static and dynamic features, which obtains excellent fault localization accuracy. Firstly, ALBFL learns the semantic features of the source code by a transformer encoder. Then, it exploits a self-attention layer to integrate those static features and dynamic features. Finally, those integrated features are fed into a LambdaRank model, which can list the suspicious statements in descending order by their ranked scores. Method: The experiments are conducted on an authoritative dataset (i.e., Defect4J), which includes 5 open-source projects, 357 faulty programs in total. We evaluate the effectiveness of ALBFL, effectiveness of combining features, effectiveness of model components and aggregation on method level. Result: The results reflect that ALBFL identifies triple more faulty statements than 11 traditional SBFL methods and outperforms 2 state-of-the-art approaches by on average 14% on ranking faults in the first position. Conclusions: To improve the precision of automatic software fault localization, ALBFL combines neural network ranking model equipped with the self-attention layer and the transformer encoder, which can take full use of various techniques to judge whether a code statement is fault-inducing or not. Moreover, the joint architecture of ALBFL is capable of training the integration of these features under various strategies so as to improve accuracy further. In the future, we plan to exploit more features so as to improve our method's efficiency and accuracy.