Liyu Shen’s research while affiliated with Zhejiang University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Learning-based Models for Vulnerability Detection: An Extensive Study
  • Preprint

August 2024

·

2 Reads

Chao Ni

·

Liyu Shen

·

Xiaodan Xu

·

[...]

·

Shaohua Wang

Though many deep learning-based models have made great progress in vulnerability detection, we have no good understanding of these models, which limits the further advancement of model capability, understanding of the mechanism of model detection, and efficiency and safety of practical application of models. In this paper, we extensively and comprehensively investigate two types of state-of-the-art learning-based approaches (sequence-based and graph-based) by conducting experiments on a recently built large-scale dataset. We investigate seven research questions from five dimensions, namely model capabilities, model interpretation, model stability, ease of use of model, and model economy. We experimentally demonstrate the priority of sequence-based models and the limited abilities of both LLM (ChatGPT) and graph-based models. We explore the types of vulnerability that learning-based models skilled in and reveal the instability of the models though the input is subtlely semantical-equivalently changed. We empirically explain what the models have learned. We summarize the pre-processing as well as requirements for easily using the models. Finally, we initially induce the vital information for economically and safely practical usage of these models.



MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

June 2024

·

26 Reads

We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt advanced tools to ensure the extracted code integrality and enrich the code with four different transformed representations. In total, MegaVul contains 17,380 vulnerabilities collected from 992 open-source repositories spanning 169 different vulnerability types disclosed from January 2006 to October 2023. Thus, MegaVul can be used for a variety of software security-related tasks including detecting vulnerabilities and assessing vulnerability severity. All information is stored in the JSON format for easy usage. MegaVul is publicly available on GitHub and will be continuously updated. It can be easily extended to other programming languages.


Citations (1)


... Upon reevaluation, it was discovered to lead to remote code execution, thereby raising the CVSS score to 9.0 [10]. According to real vulnerability data published on the CVE collected by MegaVul [11] from 2006 to 2023, after statistical analysis, only 12.1% (820/6,769) are criticalrisk vulnerabilities. Therefore, it is essential to distinguish high-severity, exploitable vulnerabilities from low-risk ones * Corresponding Authors to ensure that remediation efforts are both efficient and costeffective [12], [13]. ...

Reference:

VulStamp: Vulnerability Assessment using Large Language Model
MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations
  • Citing Conference Paper
  • July 2024