Jim Laredo’s research while affiliated with IBM Research - Thomas J. Watson Research Center and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (23)


Infer bug report example
Feature Exploration. We experiment with a few features that may reflect the complexity of the issues. After normalization, the averages of these 8 feature values for true positives and false positives are significantly different, which suggests a classifier may achieve good performance
The overview of D2A dataset generation pipeline
A simplified example in D2A dataset
The blue X marks the point on the ROC curve that minimizes the distance from the top-left corner. The green dot indicates the point where the False Positive Reduction Rate is 95%. This figure is for the Classical ML model described in Section 4.3

+4

Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT
  • Article
  • Full-text available

February 2024

·

113 Reads

·

5 Citations

Empirical Software Engineering

Saurabh Pujar

·

Yunhui Zheng

·

Luca Buratti

·

[...]

·

Zhong Su

Static analysis tools are widely used for vulnerability detection as they can analyze programs with complex behavior and millions of lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to learn from programming language data opens new possibilities of reducing false positives when applied to static analysis. However, existing datasets to train models for vulnerability identification suffer from multiple limitations such as limited bug context, limited size, and synthetic and unrealistic source code. We propose Differential Dataset Analysis or D2A, a differential analysis based approach to label issues reported by static analysis tools. The dataset built with this approach is called the D2A dataset. The D2A dataset is built by analyzing version pairs from multiple open source projects. From each project, we select bug fixing commits and we run static analysis on the versions before and after such commits. If some issues detected in a before-commit version disappear in the corresponding after-commit version, they are very likely to be real bugs that got fixed by the commit. We use D2A to generate a large labeled dataset. We then train both classic machine learning models and deep learning models for vulnerability identification using the D2A dataset. We show that the dataset can be used to build a classifier to identify possible false alarms among the issues reported by static analysis, hence helping developers prioritize and investigate potential true positives first. To facilitate future research and contribute to the community, we make the dataset generation pipeline and the dataset publicly available. We have also created a leaderboard based on the D2A dataset, which has already attracted attention and participation from the community.

Download


Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability Detection

May 2023

·

8 Reads

·

2 Citations

ACM Transactions on Software Engineering and Methodology

AI models of code have made significant progress over the last few years. However, many models are actually not learning task-relevant source code features. Instead, they often fit non-relevant but correlated data, leading to a lack of robustness and generalizability, and limiting the subsequent practical use of such models. In this work, we focus on improving the model quality through signal awareness , i.e., learning the relevant signals in the input for making predictions. We do so by leveraging the heterogeneity of code samples in terms of their signal-to-noise content. We perform an end-to-end exploration of model signal awareness, comprising: (i) uncovering the reliance of AI models of code on task-irrelevant signals, via prediction-preserving input minimization, (ii) improving models’ signal awareness by incorporating the notion of code complexity during model training, via curriculum learning, (iii) improving models’ signal awareness by generating simplified signal-preserving programs and augmenting them to the training dataset, and (iv) presenting a novel interpretation of the model learning behavior from the perspective of the dataset, using its code complexity distribution. We propose a new metric to measure model signal awareness- Signal-Aware Recall, which captures how much of the model’s performance is attributable to task-relevant signal learning. Using a software vulnerability detection use-case, our model probing approach uncovers a significant lack of signal awareness in the models, across three different neural network architectures and three datasets. Signal-Aware Recall is observed to be in the sub-50s for models with traditional Recall in the high 90s, suggesting that the models are presumably picking up a lot of noise or dataset nuances while learning their logic. With our code-complexity-aware model learning enhancement techniques we are able to assist the models towards more task-relevant learning, recording up-to 4.8x improvement in model signal awareness. Finally, we employ our model learning introspection approach to uncover the aspects of source code where the model is facing difficulty, and analyze how our learning enhancement techniques alleviate it.




A Goal-Driven Natural Language Interface for Creating Application Integration Workflows

June 2022

·

20 Reads

·

7 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Web applications and services are increasingly important in a distributed internet filled with diverse cloud services and applications, each of which enable the completion of narrowly defined tasks. Given the explosion in the scale and diversity of such services, their composition and integration for achieving complex user goals remains a challenging task for end-users and requires a lot of development effort when specified by hand. We present a demonstration of the Goal Oriented Flow Assistant (GOFA) system, which provides a natural language solution to generate workflows for application integration. Our tool is built on a three-step pipeline: it first uses Abstract Meaning Representation (AMR) to parse utterances; it then uses a knowledge graph to validate candidates; and finally uses an AI planner to compose the candidate flow. We provide a video demonstration of the deployed system as part of our submission.



VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

December 2021

·

22 Reads

·

1 Citation

Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow easily and unwittingly within and across software repositories like GitHub. Across such millions of lines of code, traditional static and dynamic approaches struggle to scale. Although existing machine-learning-based approaches look promising in such a setting, most work detects vulnerable code at a higher granularity -- at the method or file level. Thus, developers still need to inspect a significant amount of code to locate the vulnerable statement(s) that need to be fixed. This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph and effectively understand code semantics and vulnerable patterns. To study VELVET's effectiveness, we use an off-the-shelf synthetic dataset and a recently published real-world dataset. In the static analysis setting, where vulnerable functions are not detected in advance, VELVET achieves 4.5x better performance than the baseline static analyzers on the real-world data. For the isolated vulnerability localization task, where we assume the vulnerability of a function is known while the specific vulnerable statement is unknown, we compare VELVET with several neural networks that also attend to local and global context of code. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively, outperforming the baseline deep-learning models by 5.3-29.0%.


Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection

November 2021

·

13 Reads

AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models' signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model learning introspection approach from the perspective of the dataset.



Citations (16)


... While code analysis methods have proven effective for detecting predefined patterns of data leakage, they often struggle with scalability and adapting to complex or novel scenarios in evolving codebases (Pujar et al., 2024). In contrast, ML models, such as CodeBERT and GPT, can scale efficiently to large codebases and adapt to diverse coding practices that rule-based methods may miss (Pujar et al., 2024;Zhuang et al., 2020). ...

Reference:

Data leakage detection in machine learning code: transfer learning, active learning, or low-shot prompting?
Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

Empirical Software Engineering

... Although these approaches have proven fundamental in software development, they are burdened with restrictions. Manual debugging requires a significant amount of effort, typically shifting resources from feature development to maintenance [11]. ...

Code Vulnerability Detection via Signal-Aware Learning
  • Citing Conference Paper
  • July 2023

... Existing work typically enhances model robustness through data augmentation and adversarial training (Madry et al., 2018). Bielik and Vechev (2020) refine model representations by feeding only pertinent program parts to the model; Suneja et al. (2023) use curriculum learning and data augmentation with simplified programs. ...

Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability Detection
  • Citing Article
  • May 2023

ACM Transactions on Software Engineering and Methodology

... Our work therefore simulates the use of xAI for explanatory debugging [19,22] with concept-based explanations [21], also called the "glitch detector task" [41,43]. We investigate how xAI may improve people's mental models for AI [2,10], and how personalized xAI will affect people's ability to accurately identify when their assistant is correct or incorrect (i.e., if the agent adapts to the user, will the user make fewer mistakes?). Our contributions include: ...

Follow the Successful Herd: Towards Explanations for Improved Use and Mental Models of Natural Language Systems
  • Citing Conference Paper
  • March 2023

... Specifically, we conduct an empirical analysis utilizing 410 real-world bugs collected from four widely-used popular ML libraries to address the issue of how many bugs from these ML libraries can be detected by static bug detectors and why they miss detecting real-world security bugs of ML libraries. We select five popular and open-source static bug detectors as the research subjects, i.e., Flawfinder [3], [16], [28]- [34], RATS [3], [33], [35], Cppcheck [3], [29], [34], [36], Facebook Infer [8], [37], [38], [38], and Clang static analyzer [3], [39]- [42]. The fundamental strategy entails employing each bug detector on a version of a program that is afflicted with a particular bug and determining whether or not the bug is discovered by the bug detector. ...

Varangian: a git bot for augmented static analysis
  • Citing Conference Paper
  • October 2022

... Large Language Models: In recent years, there has been a notable emergence of LLMs, which are increasingly recognized as promising solutions for the field of vulnerability identification (Thapa et al. 2022) (Ding et al. 2022) (Feng et al. 2020) (Hanif and Maffeis 2022). BERT (Devlin et al. 2018) is a deep bidirectional encoder based on the transformer architecture, pre-trained by Google on a vast corpus comprising millions of text passages and billions of words. ...

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements
  • Citing Conference Paper
  • March 2022

... Recently, there have been several applications in which first multiple plans are generated and then the users are involved in the selection process. Some of these applications are in the area of patient monitoring , enterprise risk management , conversational systems (Chakraborti et al. 2022;Rizk et al. 2020;Sreedharan et al. 2020b), and web service composition (Brachman et al. 2022). However, the user interfaces for interacting with such systems has received little attention. ...

A Goal-Driven Natural Language Interface for Creating Application Integration Workflows
  • Citing Article
  • June 2022

Proceedings of the AAAI Conference on Artificial Intelligence

... While GraphQL offers significant advantages in terms of data query flexibility and efficiency, its adoption has also introduced new vectors for DoS attacks, primarily due to its flexible and introspective querying capabilities [12], [13]. The proposed empirical framework not only fills the gap in current security validation research but also contributes to the development of actionable benchmarks [14], thus aiding in the implementation of more secure GraphQL APIs. ...

Learning GraphQL Query Cost
  • Citing Conference Paper
  • November 2021

... Barbez et al. [37] used an ensemble of feature extraction tools to generate a comprehensive input for defect detection. Other than predicting whether a function is vulnerable, Ding et al. [38] used ensemble learning to detect vulnerabilities on a statement level. ...

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements
  • Citing Preprint
  • December 2021

... -Open Source and Peer Review: Whenever possible, make the AGI's code and training data open source for peer review and community scrutiny. Encourage collaboration and transparency in the development process (Suneja et al., 2021). ...

Towards Reliable AI for Source Code Understanding
  • Citing Conference Paper
  • November 2021