January 2025
·
9 Reads
Neural Computing and Applications
Automated vulnerability detection is crucial to protect software systems. However, state-of-the-art approaches mainly focus on a single view of the source code, which often leads to incomplete code representation and low detection accuracy. To solve these problems, this paper proposes a novel automatic vulnerability detection model, DMVL4AVD, based on deep multi-view learning that represents source codes from three distinct views: code sequences, code property graphs, and code metrics. Different deep models are employed to extract features from each view. Firstly, the [CLS] vectors derived from encoder layers 1 to 12 of GraphCodeBERT are used as code sequence features which contain rich semantic information. Next, the gated graph neural network (GGNN) is exploited to learn the features of nodes in the code property graph, encompassing both syntactic and dependency information of the source code. During the extraction of graph features, node representation is augmented by incorporating the degree centrality of each node, along with its corresponding code and type attributes, resulting in a more comprehensive depiction of the graph's structure. Statistical metrics generated by the code analysis tool SourceMonitor are then processed through a 1-dimensional (1-D) CNN to produce metric features. Fused features from these three views are learned by a multilayer perceptron (MLP) to yield final classification results. Experimental results demonstrate the superiority of DMVL4AVD over existing approaches. The model performs significantly better than the studied baselines, achieving an average increase in accuracy of 6.79% and an average boost of 6.94% in precision compared to the approaches in the literature.