Rui Shu’s research while affiliated with North Carolina State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (16)


Dazzle: using optimized generative adversarial networks to address security data class imbalance issue
  • Conference Paper

October 2022

·

9 Reads

·

9 Citations

Rui Shu

·

Tianpei Xia

·

Laurie Williams

·


Example HTTP message
Example HTTP Sequence
Applying Manual Test Techniques (based on ISO/IEC/IEEE 29119-1)
Example SMPT Test Case
Applying Tool-Based Techniques

+6

Do I really need all this work to find vulnerabilities?: An empirical case study comparing vulnerability detection techniques on a Java application
  • Article
  • Publisher preview available

August 2022

·

129 Reads

·

24 Citations

Empirical Software Engineering

Sarah Elder

·

·

Rui Shu

·

[...]

·

Laurie Williams

Context Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. Objective The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based web application. Method We apply four different categories of vulnerability detection techniques – systematic manual penetration testing (SMPT), exploratory manual penetration testing (EMPT), dynamic application security testing (DAST), and static application security testing (SAST) – to an open-source medical records system. Results We found the most vulnerabilities using SAST. However, EMPT found more severe vulnerabilities. With each technique, we found unique vulnerabilities not found using the other techniques. The efficiency of manual techniques (EMPT, SMPT) was comparable to or better than the efficiency of automated techniques (DAST, SAST) in terms of Vulnerabilities per Hour (VpH). Conclusions The vulnerability detection technique practitioners should select may vary based on the goals and available resources of the project. If the goal of an organization is to find “all” vulnerabilities in a project, they need to use as many techniques as their resources allow.

View access options

Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application

August 2022

·

78 Reads

CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based web application. METHOD: We apply four different categories of vulnerability detection techniques \textendash~ systematic manual penetration testing (SMPT), exploratory manual penetration testing (EMPT), dynamic application security testing (DAST), and static application security testing (SAST) \textendash\ to an open-source medical records system. RESULTS: We found the most vulnerabilities using SAST. However, EMPT found more severe vulnerabilities. With each technique, we found unique vulnerabilities not found using the other techniques. The efficiency of manual techniques (EMPT, SMPT) was comparable to or better than the efficiency of automated techniques (DAST, SAST) in terms of Vulnerabilities per Hour (VpH). CONCLUSIONS: The vulnerability detection technique practitioners should select may vary based on the goals and available resources of the project. If the goal of an organization is to find "all" vulnerabilities in a project, they need to use as many techniques as their resources allow.


Predicting health indicators for open source projects (using hyperparameter optimization)

June 2022

·

378 Reads

·

17 Citations

Empirical Software Engineering

Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like k-nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates. But that error rate can be greatly reduced using hyperparameter optimization. To the best of our knowledge, this is the largest study yet conducted, using recent data for predicting multiple health indicators of open-source projects. To facilitate open science (and replications and extensions of this work), all our materials are available online at https://github.com/arennax/Health_Indicator_Prediction.


Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)

May 2022

·

82 Reads

Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class. However, such labeled data can be scarce and expensive to acquire. Goal: To help security practitioners train useful security classification models when few labeled training data and many unlabeled training data are available. Method: We propose an adaptive framework called Dapper, which optimizes 1) semi-supervised learning algorithms to assign pseudo-labels to unlabeled data in a propagation paradigm and 2) the machine learning classifier (i.e., random forest). When the dataset class is highly imbalanced, Dapper then adaptively integrates and optimizes a data oversampling method called SMOTE. We use the novel Bayesian Optimization to search a large hyperparameter space of these tuning targets. Result: We evaluate Dapper with three security datasets, i.e., the Twitter spam dataset, the malware URLs dataset, and the CIC-IDS-2017 dataset. Experimental results indicate that we can use as low as 10% of original labeled data but achieve close or even better classification performance than using 100% labeled data in a supervised way. Conclusion: Based on those results, we would recommend using hyperparameter optimization with semi-supervised learning when dealing with shortages of labeled security data.


Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue

March 2022

·

40 Reads

Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets. Method: We introduce an approach called Dazzle which is an optimized version of conditional Wasserstein Generative Adversarial Networks with gradient penalty (cWGAN-GP). Dazzle explores the architecture hyperparameters of cWGAN-GP with a novel optimizer called Bayesian Optimization. We use Dazzle to generate minority class samples to resample the original imbalanced training dataset. Results: We evaluate Dazzle with three software security datasets, i.e., Moodle vulnerable files, Ambari bug reports, and JavaScript function code. We show that Dazzle is practical to use and demonstrates promising improvement over existing state-of-the-art oversampling techniques such as SMOTE (e.g., with an average of about 60% improvement rate over SMOTE in recall among all datasets). Conclusion: Based on this study, we would suggest the use of optimized GANs as an alternative method for security vulnerability data class imbalanced issues.


Omni: automated ensemble with unexpected models against adversarial evasion attack

January 2022

·

240 Reads

·

17 Citations

Empirical Software Engineering

ContextMachine learning-based security detection models have become prevalent in modern malware and intrusion detection systems. However, previous studies show that such models are susceptible to adversarial evasion attacks. In this type of attack, inputs (i.e., adversarial examples) are specially crafted by intelligent malicious adversaries, with the aim of being misclassified by existing state-of-the-art models (e.g., deep neural networks). Once the attackers can fool a classifier to think that a malicious input is actually benign, they can render a machine learning-based malware or intrusion detection system ineffective.Objective To help security practitioners and researchers build a more robust model against non-adaptive, white-box and non-targeted adversarial evasion attacks through the idea of ensemble model.Method We propose an approach called Omni, the main idea of which is to explore methods that create an ensemble of “unexpected models”; i.e., models whose control hyperparameters have a large distance to the hyperparameters of an adversary’s target model, with which we then make an optimized weighted ensemble prediction.ResultsIn studies with five types of adversarial evasion attacks (FGSM, BIM, JSMA, DeepFool and Carlini-Wagner) on five security datasets (NSL-KDD, CIC-IDS-2017, CSE-CIC-IDS2018, CICAndMal2017 and the Contagio PDF dataset), we show Omni is a promising approach as a defense strategy against adversarial attacks when compared with other baseline treatments.Conclusions When employing ensemble defense against adversarial evasion attacks, we suggest to create ensemble with unexpected models that are distant from the attacker’s expected model (i.e., target model) through methods such as hyperparameter optimization.



An example of security bug report from the Apache Ambari project mislabelled as non-security bug report from Peters et al. (2018)
Comparison of different treatments in ranking bug report prediction results. This plot shows its results using the deciles of (3) (from Section 4.4) and higher y-axis is better. Different treatments are denoted with lines of different colors. Specifically, the baseline (shown in blue color) is the method that does not apply any ranking technique (i.e., with the original chronological order). The orange line denotes the best ranking results from FARSEC among all filters
How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

May 2021

·

110 Reads

·

22 Citations

Empirical Software Engineering

Background In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort. Goal The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports. Method Our proposed method, called SWIFT, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, SWIFT uses a technique called 𝜖-dominance that learns how to avoid operations that do not significantly improve performance. Result When compared to recent state-of-the-art results (from FARSEC which is published in TSE’18), we find that the SWIFT’s dual optimization of both pre-processor and learner is more useful than optimizing each of them individually. For example, in a study of security bug reports from the Chromium dataset, the median recalls of FARSEC and SWIFT were 15.7% and 77.4%, respectively. For another example, in experiments with data from the Ambari project, the median recalls improved from 21.5% to 85.7% (FARSEC to SWIFT). Conclusion Overall, our approach can quickly optimize models that achieve better recalls than the prior state-of-the-art. These increases in recall are associated with moderate increases in false positive rates (from 8% to 24%, median). For future work, these results suggest that dual optimization is both practical and useful.


Fig. 4. Exploratory Penetration Testing Efficiency 41
Structuring a Comprehensive Software Security Course Around the OWASP Application Security Verification Standard

March 2021

·

310 Reads

Lack of security expertise among software practitioners is a problem with many implications. First, there is a deficit of security professionals to meet current needs. Additionally, even practitioners who do not plan to work in security may benefit from increased understanding of security. The goal of this paper is to aid software engineering educators in designing a comprehensive software security course by sharing an experience running a software security course for the eleventh time. Through all the eleven years of running the software security course, the course objectives have been comprehensive - ranging from security testing, to secure design and coding, to security requirements to security risk management. For the first time in this eleventh year, a theme of the course assignments was to map vulnerability discovery to the security controls of the Open Web Application Security Project (OWASP) Application Security Verification Standard (ASVS). Based upon student performance on a final exploratory penetration testing project, this mapping may have increased students' depth of understanding of a wider range of security topics. The students efficiently detected 191 unique and verified vulnerabilities of 28 different Common Weakness Enumeration (CWE) types during a three-hour period in the OpenMRS project, an electronic health record application in active use.


Citations (7)


... DE has been widely applied [67] 8 . Within software engineering, DE has been used for optimization tasks such as Fu et al. [68] tuning study on defect prediction; Shu et al.'s study on tuning detectors for security issues [69], and Xia at al.s study that tuned project health predictors for for opensource JAVA systems [34]. ...

Reference:

Less Noise, More Signal: DRR for Better Optimizations of SE Tasks
Dazzle: using optimized generative adversarial networks to address security data class imbalance issue
  • Citing Conference Paper
  • October 2022

... IAST and RASP have not often been compared to well-established counterparts, such as Dynamic Application Security Testing (DAST), Static Application Security Testing (SAST), and penetration testing, particularly in the context of a large system. Austin and Williams (2011), Austin et al. (2013), and Elder et al. (2022) compared the performance of DAST, SAST, and does not take into account the performance of IAST and RASP. Another previous study by which relied on the OWASP Benchmark project (Open Web Application Security Project (OWASP) Foundation 2022), a limited Java test suite focused on a fixed set of vulnerabilities and, lacked human review, thereby failing to accurately represent real-world scenarios. ...

Do I really need all this work to find vulnerabilities?: An empirical case study comparing vulnerability detection techniques on a Java application

Empirical Software Engineering

... Understanding when a software project is in a healthy state remains a critical yet unsolved challenge in software development. While repositories provide extensive data about project activities, from code changes to community interactions, current approaches struggle to convert this wealth of information into actionable insights about project health [1,2]. This gap affects both practitioners managing projects and researchers studying software development. ...

Predicting health indicators for open source projects (using hyperparameter optimization)

Empirical Software Engineering

... Notably, various categories of ransomware exist, each with unique characteristics. These categories encompass crypto worms in ref. [27], Human-operated Ransomware in ref. [28], Ransomware-as-a-Service (RaaS) in ref. [29], and Automated Active Adversary ransomware in ref. [30]. Table 2 encapsulates the essential features, propagation methods, exploitation strategies, and ransomware families associated with these diverse ransomware types. ...

Omni: automated ensemble with unexpected models against adversarial evasion attack

Empirical Software Engineering

... The security testing method comprises two parts: vulnerability assessment and penetration testing [36]. Both tests are performed on each product service during the research, with the front-end system and API system acting as the back-end. ...

Structuring a Comprehensive Software Security Course Around the OWASP Application Security Verification Standard
  • Citing Conference Paper
  • May 2021

... In addition, researchers also analyze the vulnerabilities from various project artifacts (e.g., IRs, bug reports, etc.). Some researchers utilized text-mining methods to explore the security bug reports to identify the vulnerabilities [29,[82][83][84], while other works analyze the negative impact of the vulnerabilities from the IRs [62,64,66,75]. The other researchers focus on the crowd-based security discussions, e.g., security posts in Stack Overflow, and discussion groups in Gitter/Slacks, to analyze the topics, attacks, and the corresponding mitigations [40,52,67,89,92]. ...

How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

Empirical Software Engineering

... The annual PROMISE meeting knows it needs to revisit its goals and methods. Gema Rodríguez-Pérez 9 7. E.g. see the 1100+ recent Github projects used by Xia et al. [50], or everything that can be extracted using CommitGuru [51]. 8 10 cautions that in the early years of PROMISE, data sets were often not really raw data, but rather directly collections of metrics. ...

Sequential Model Optimization for Software Effort Estimation
  • Citing Article
  • December 2020

IEEE Transactions on Software Engineering