Li Li’s research while affiliated with Beihang University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (119)


Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions
  • Preprint

December 2024

·

1 Read

·

·

Li Li

Recent years have witnessed the emerging trend of extensions in modern Integrated Development Environments (IDEs) like Visual Studio Code (VSCode) that significantly enhance developer productivity. Especially, popular AI coding assistants like GitHub Copilot and Tabnine provide conveniences like automated code completion and debugging. While these extensions offer numerous benefits, they may introduce privacy and security concerns to software developers. However, there is no existing work that systematically analyzes the security and privacy concerns, including the risks of data exposure in VSCode extensions. In this paper, we investigate on the security issues of cross-extension interactions in VSCode and shed light on the vulnerabilities caused by data exposure among different extensions. Our study uncovers high-impact security flaws that could allow adversaries to stealthily acquire or manipulate credential-related data (e.g., passwords, API keys, access tokens) from other extensions if not properly handled by extension vendors. To measure their prevalence, we design a novel automated risk detection framework that leverages program analysis and natural language processing techniques to automatically identify potential risks in VSCode extensions. By applying our tool to 27,261 real-world VSCode extensions, we discover that 8.5\% of them (i.e., 2,325 extensions) are exposed to credential-related data leakage through various vectors, such as commands, user input, and configurations. Our study sheds light on the security challenges and flaws of the extension-in-IDE paradigm and provides suggestions and recommendations for improving the security of VSCode extensions and mitigating the risks of data exposure.






Taxonomy of machine learning operations based on (Wang et al. 2022)
Example of a Jupyter notebook containing a machine learning solution
High-level overview of HeaderGen
Generated assignment graphs for the variable “model” in the motivating example, left in PyCG (empty), right in HeaderGen (flow-sensitive)
Workflow of imported library function return-type resolution

+3

Static analysis driven enhancements for comprehension in machine learning notebooks
  • Article
  • Full-text available

August 2024

·

60 Reads

·

1 Citation

Empirical Software Engineering

Jupyter notebooks have emerged as the predominant tool for data scientists to develop and share machine learning solutions, primarily using Python as the programming language. Despite their widespread adoption, a significant fraction of these notebooks, when shared on public repositories, suffer from insufficient documentation and a lack of coherent narrative. Such shortcomings compromise the readability and understandability of the notebook. Addressing this shortcoming, this paper introduces HeaderGen, a tool-based approach that automatically augments code cells in these notebooks with descriptive markdown headers, derived from a predefined taxonomy of machine learning operations. Additionally, it systematically classifies and displays function calls in line with this taxonomy. The mechanism that powers HeaderGen is an enhanced call graph analysis technique, building upon the foundational analysis available in PyCG. To improve precision, HeaderGen extends PyCG’s analysis with return-type resolution of external function calls, type inference, and flow-sensitivity. Furthermore, leveraging type information, HeaderGen employs pattern matching techniques on the code syntax to annotate code cells. We conducted an empirical evaluation on 15 real-world Jupyter notebooks sourced from Kaggle. The results indicate a high accuracy in call graph analysis, with precision at 95.6% and recall at 95.3%. The header generation has a precision of 85.7% and a recall rate of 92.8% with regard to headers created manually by experts. A user study corroborated the practical utility of HeaderGen, revealing that users found HeaderGen useful in tasks related to comprehension and navigation. To further evaluate the type inference capability of static analysis tools, we introduce TypeEvalPy, a framework for evaluating type inference tools for Python with an in-built micro-benchmark containing 154 code snippets and 845 type annotations in the ground truth. Our comparative analysis on four tools revealed that HeaderGen outperforms other tools in exact matches with the ground truth.

Download


Neural Library Recommendation by Embedding Project-Library Knowledge Graph

June 2024

·

10 Reads

·

2 Citations

IEEE Transactions on Software Engineering

The prosperity of software applications brings fierce market competition to developers. Employing third-party libraries (TPLs) to add new features to projects under development and to reduce the time to market has become a popular way in the community. However, given the tremendous TPLs ready for use, it is challenging for developers to effectively and efficiently identify the most suitable TPLs. To tackle this obstacle, we propose an innovative approach named PyRec to recommend potentially useful TPLs to developers for their projects. Taking Python project development as a use case, PyRec embeds Python projects, TPLs, contextual information, and relations between those entities into a knowledge graph. Then, it employs a graph neural network to capture useful information from the graph to make TPL recommendations. Different from existing approaches, PyRec can make full use of not only project-library interaction information but also contextual information to make more accurate TPL recommendations. Comprehensive evaluations are conducted based on 12,421 Python projects involving 963 TPLs, 9,675 extra entities, 121,474 library usage records, and 73,277 contextual records. Compared with five representative approaches, PyRec improves the recommendation performance significantly in all cases.




Citations (62)


... However, as LLMs continue to gain prominence, they also face critical threats to their reliability and robustness. These vulnerabilities, if exploited, can lead to misuse, including generating harmful content [1][2][3][4][5], leaking sensitive information [6][7][8][9], or providing biased or misleading outputs [10][11][12][13][14][15][16]. ...

Reference:

Model-Editing-Based Jailbreak against Safety-aligned Large Language Models
GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
  • Citing Conference Paper
  • October 2024

... Completeness and Soundness. In this study, we use the terms completeness and soundness as they have been pre-established in callgraph research (Salis et al., 2021;Venkatesh et al., 2024a). The terms completeness and soundness are closely related to the precision and recall metrics. ...

Static analysis driven enhancements for comprehension in machine learning notebooks

Empirical Software Engineering

... Additionally, Julia has virtually no support for more advanced testing methods such as property-based testing, symbolic execution, and contract-based testing, of which are universally employed by Python's large-scale numerical methods and machine learning libraries such as Pandas [68], NumPy [46], SciPy [96] , SymPy [69], Scikit-Learn [78], Jax [14], Tinygrad [19], and Cupy [73] through the Hypothesis [49], Crosshair [88], and Deal [64] libraries. Julia also lacks an actively maintained static type checker [38,70,92]. More rigorous testing methods are especially important in scientific computation, where precision and correctness of implementations of algorithms is paramount. ...

TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools
  • Citing Conference Paper
  • May 2024

... Other studies, like Refs. [7,15,17,18], simply consider that all users have the same preferences, and thus assum all users can require only instances of the same service. However, such assumptions usually are not true in practice, and thus inevitably undermine the effectiveness of the proposed service provision strategies. ...

Neural Library Recommendation by Embedding Project-Library Knowledge Graph
  • Citing Article
  • June 2024

IEEE Transactions on Software Engineering

... Existing research has shown that most DL models are not well protected, and attackers can trivially steal them from APK files [12]. Several approaches have also been proposed to find DL apps, extract DL models [8], explore DL frameworks [9], and further implement adversarial attacks on DL models [4], [5], [13], [43]. Since there is no public dataset on real-world DL apps and models, to obtain target models, we need to collect mobile apps (i.e., APK files) and use Apktool [44] to decompose each APK file into nearly its original form, including asset files, resource files, .dex ...

Investigating White-Box Attacks for On-Device Models
  • Citing Conference Paper
  • April 2024

... Practitioners often raised concerns about the quality of LLM-generated code. For example, Liu et al. [21] found that 47% of ChatGPT-generated code snippets suffer from maintainability issues. Majdinasab et al. [22] discovered that 27% of code suggested by GitHub Copilot contains code vulnerabilities. ...

Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

ACM Transactions on Software Engineering and Methodology

... App stores/Marketplaces are mechanisms to gather user feedback from a crowd that allow continuous interaction between developers and users of mobile apps [260]. They are online software stores where users may browse, purchase, download, and install software applications [261]. In SECO, app stores/marketplaces play a crucial role by providing an online curated marketplace that allows developers to sell and distribute their software products [3,262]. ...

What is an app store? The software engineering perspective

Empirical Software Engineering

... • Packaging. In this stage, the Android application is packaged into a certain distribution format, such as the Android Package (APK) or the Android App Bundle (AAB), which comprises executables with any required configurations and third-party libraries (see lines [20][21][22][23][24][25]. ...

Understanding the quality and evolution of Android app build systems
  • Citing Article
  • August 2023

Journal of Software: Evolution and Process

... Even if some service providers do not actively disclose these details, there are still ways to identify or infer them. For example, the fact that Github Copilot has a query filter can be identified by observing the behavior of the system [62], browsing changelogs [73], and reverse engineering the client [64]. There have also been real-world cases [40,57] where user queries lure the model to disclose its given prompt, indicating the presence of corresponding components that construct such prompts. ...

Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems
  • Citing Conference Paper
  • May 2023

... Recent model obfuscation approaches propose to use static or dynamic methods to obfuscate the representation of on-device models [3, 36,37]. Such DL model representations produced by model obfuscation methods cannot be understood by automatic tools or humans, but will not affect the model performance et al. [37]. ...

ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems