Haipeng Cai

Haipeng Cai
Washington State University | WSU · School of Electrical Engineering and Computer Science

Doctor of Philosophy

About

90
Publications
15,191
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
847
Citations
Introduction
Researcher in Software Engineering with a focus on program analysis for systems reliability and code security
Additional affiliations
August 2016 - present
Washington State University
Position
  • Professor
August 2015 - August 2016
Virginia Polytechnic Institute and State University
Position
  • PhD Student
August 2012 - August 2015
University of Notre Dame
Position
  • PhD Student
Education
August 2012 - July 2015
University of Notre Dame
Field of study
  • Computer Science

Publications

Publications (90)
Conference Paper
Full-text available
Software construction using multiple languages has long been a norm, yet it is still unclear if multilingual code construction has significant security implications and real security consequences. This paper aims to address this question with a large-scale study of popular multi-language projects on GitHub and their evolution histories, enabled by...
Conference Paper
Full-text available
The availability of large-scale, realistic vulnerability datasets is essential for both benchmarking existing techniques and developing effective new ones, especially those using data-driven (e.g., machine/deep-learning based) approaches, for software security. Yet such datasets are critically lacking. A promising solution is to generate such datas...
Preprint
Full-text available
Fragmentation is a serious problem in the Android ecosystem. This problem is mainly caused by the fast evolution of the system itself and the various customizations independently maintained by different smartphone manufacturers. Many efforts have attempted to mitigate its impact via approaches to automatically pinpoint compatibility issues in Andro...
Conference Paper
Full-text available
Despite the fact that most real-world software systems today are written in multiple programming languages, existing program analysis based security techniques are still limited to single-language code. In consequence, security flaws (e.g., code vulnerabilities) at and across language boundaries are largely left out as blind spots. We present POLYC...
Article
Data-oriented attacks manipulate non-control data to alter a program’s benign behavior without violating its control-flow integrity. It has been shown that such attacks can cause significant damage even in the presence of control-flow defense mechanisms. However, these threats have not been adequately addressed. In this survey article, we first map...
Article
Full-text available
As modern software systems are increasingly developed for running in distributed environments, it is crucial to provide fundamental techniques such as dependence analysis for checking, diagnosing, and evolving those systems. However, traditional dependence analysis is either inapplicable or of very limited utility for distributed programs due to th...
Preprint
Full-text available
As modern software systems are increasingly developed for running in distributed environments, it is crucial to provide fundamental techniques such as dependence analysis for checking, diagnosing, and evolving those systems. However, traditional dependence analysis is either inapplicable or of very limited utility for distributed programs due to th...
Conference Paper
Full-text available
Dynamic information flow analysis (DIFA) supports various security applications such as malware analysis and vulnerability discovery. Yet traditional DIFA approaches have limited utility for distributed software due to applicability, portability, and scalability barriers. We present FLOWDIST, a DIFA for common distributed software that overcomes th...
Article
Cryptographic protocols are often expected to be provably secure. However, this security guarantee often falls short in practice due to various implementation flaws. We propose a new paradigm called cryptographic program analysis (CPA) which prescribes the use of program analysis to detect these implementation flaws at compile time. The principal i...
Article
Full-text available
Context Memory error vulnerabilities have been consequential and several well-known, open-source memory error vulnerability detectors exist, built on static and/or dynamic code analysis. Yet there is a lack of assessment of such detectors based on rigorous, quantitative accuracy and efficiency measures while not being limited to specific applicatio...
Article
Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded...
Preprint
Full-text available
Bug reports (BR) contain vital information that can help triaging teams prioritize and assign bugs to developers who will provide the fixes. However, studies have shown that BR fields often contain incorrect information that need to be reassigned, which delays the bug fixing process. There exist approaches for predicting whether a BR field should b...
Conference Paper
Full-text available
Bug reports (BR) contain vital information that can help triaging teams prioritize and assign bugs to developers who will provide the fixes. However, studies have shown that BR fields often contain incorrect information that need to be reassigned, which delays the bug fixing process. There exist approaches for predicting whether a BR field should b...
Preprint
Full-text available
A playtest is the process in which human testers are recruited to play video games and to reveal software bugs. Manual testing is expensive and time-consuming, especially when there are many mobile games to test and every software version requires for extensive testing before being released. Existing testing frameworks (e.g., Android Monkey) are li...
Conference Paper
Full-text available
We envision visual semantics learning (VSL), a novel methodology that derives high-level functional description of given software from its visual (graphical) outputs. By visual semantics, we mean the semantic description about the software’s behaviors that are exhibited in its visual outputs. VSL works by composing this description based on visual...
Article
Full-text available
Distributed software systems are increasingly developed and deployed today. Many of these systems are supposed to run continuously. Given their critical roles in our society and daily lives, assuring the quality of distributed systems is crucial. Analyzing runtime program dependencies has long been a fundamental technique underlying numerous tool s...
Article
Full-text available
Machine learning–based classification dominates current malware detection approaches for Android. However, due to the evolution of both the Android platform and its user apps, existing such techniques are widely limited by their reliance on new malware samples, which may not be timely available, and constant retraining, which is often very costly....
Article
Full-text available
With the rise of the mobile computing market, Android has received tremendous attention from both academia and industry. Application programming in Android is known to have unique characteristics, and Android apps be particularly vulnerable to various security attacks. In response, numerous solutions for particular security issues have been propose...
Article
Full-text available
Context The constant evolution of the Android platform and its applications have imposed significant challenges both to understanding and securing the Android ecosystem. Yet, despite the growing body of relevant research, it remains unclear how Android apps evolve in terms of their run-time behaviors in ways that impede our gaining consistent empir...
Conference Paper
Full-text available
As in other software domains, information flow security is a fundamental aspect of code security in distributed systems. However, most existing solutions to information flow security are limited to centralized software. For distributed systems, such solutions face multiple challenges, including technique applicability, tool portability, and analysi...
Conference Paper
Full-text available
The rapid expansion of the Android ecosystem is accompanied by continuing diversification of platforms and devices, resulting in increasing incompatibility issues which damage user experiences and impede app development productivity. In this paper, we conducted a large-scale, longitudinal study of compatibility issues in 62,894 benign apps develope...
Article
Full-text available
Most existing Android malware detection and categorization techniques are static approaches, which suffer from evasion attacks such as obfuscation. By analyzing program behaviors, dynamic approaches are potentially more resilient against these attacks. Yet existing dynamic approaches mostly rely on characterizing system calls which are subject to s...
Article
Full-text available
Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (...
Preprint
Full-text available
Machine learning-based malware detection dominates current security defense approaches for Android apps. However, due to the evolution of Android platforms and malware, existing such techniques are widely limited by their need for constant retraining that are costly, and reliance on new malware samples that may not be timely available. As a result,...
Preprint
Machine learning-based malware detection dominates current security defense approaches for Android apps. However, due to the evolution of Android platforms and malware, existing such techniques are widely limited by their need for constant retraining that are costly, and reliance on new malware samples that may not be timely available. As a result,...
Conference Paper
Full-text available
Today, computing on various Android devices is pervasive. However, growing security vulnerabilities and attacks in the Android ecosystem constitute various threats through user apps. Taint analysis is a common technique for defending against these threats, yet it suffers from challenges in attaining practical simultaneous scalability and effectiven...
Conference Paper
Full-text available
Approaches to Android malware detection built on supervised learning are commonly subject to frequent retraining, or the trained classifier may fail to detect newly emerged or emerging kinds of malware. This work targets a sustainable Android malware detector that, once trained on a dataset, can continue to effectively detect new malware without re...
Conference Paper
Full-text available
The runtime permission model of Android enhances security yet also constitutes a source of incompatibility issues that impedes the productivity of mobile developers. This paper presents a novel analysis that detects the incompatible permission uses in a given app and repairs them when found, hence automatically adapting the app to the runtime permi...
Conference Paper
Full-text available
We present ICC-INSPECT, a tool for understanding Android app behaviors exhibited at runtime via inter-component communication (ICC). Through lightweight Intent profiling, ICC-INSPECT streams run-time ICC information to a dynamic visualization framework which depicts interactive ICC call graphs along with informative ICC statistics. This framework a...
Article
To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of ma...
Conference Paper
Full-text available
Most existing research for Android focuses on particular security issues, yet there is little broad understanding of Android application run-time characteristics and their implications. To mitigate this gap, we present the first systematic dynamic characterization study of Android apps that targets a broad understanding of application behaviors in...
Conference Paper
Full-text available
As the Android app market keeps growing, there is a pressing need for automated tool supports to empower Android developers to produce quality apps with higher productivity. Yet existing tools for Android mostly aim at security and privacy protection, primarily targeting end users and security analysts. Towards filling this gap, we present DROIDFAX...
Conference Paper
Full-text available
Inter-component communication (ICC) serves as a key element of any Android app's implementation. Specifically, an Android app uses Intents as the main mechanism for ICC to complete tasks such as switching between different user interfaces, starting background services, communicating to other apps on the Android device, and saving or retrieving data...
Article
Full-text available
Impact analysis determines the effects that program entities of interest, or changes to them, may have on the rest of the program for software measurement, maintenance, and evolution tasks. Dynamic impact analysis could be one major approach to impact analysis that computes smaller impact sets than static alternatives for concrete sets of execution...
Conference Paper
Full-text available
Inter-Component Communication (ICC) enables useful interactions between mobile apps. However, misuse of ICC exposes users to serious threats such as intent hijacking/spoofing and app collusions, allowing malicious apps to access privileged user data via another app. Unfortunately, existing ICC analyses are largely incompetent in both accuracy and s...
Article
Full-text available
The traditional software dependence (TSD) model based on the system dependence graph enables precise fine-grained program dependence analysis that supports a range of software analysis and testing tasks. However, this model often faces scalability challenges that hinder its applications as it can be unnecessarily expensive, especially for client an...
Conference Paper
Full-text available
Dynamic impact analysis is a fundamental technique for understanding the impact of specific program entities, or changes to them, on the rest of the program for concrete executions. However, existing techniques are either inapplicable or of very limited utility for distributed programs running in multiple concurrent processes. This paper presents D...
Article
Full-text available
Dynamic impact analysis is a fundamental technique for understanding the impact of specific program entities, or changes to them, on the rest of the program for concrete executions. However, existing techniques are either inapplicable or of very limited utility for distributed programs running in multiple concurrent processes. This paper presents D...
Article
Full-text available
Impact analysis not only assists developers with change planning and management, but also facilitates a range of other client analyses, such as testing and debugging. In particular, for developers working in the context of specific program executions, dynamic impact analysis is usually more desirable than static approaches, as it produces more mana...
Article
Full-text available
This paper presents a parallel visualization technique for illustrative rendering of dense three-dimensional (3D) geometry data sets. Our approach maps the depth information in each geometry onto various visual dimensions of graphical representations, including shape, color, brightness, transparency, and size, to achieve legible display in dense ge...
Article
Full-text available
Software is constantly changing. To ensure the quality of this process, when preparing to change a program, developers must first identify the main consequences and risks of modifying the program locations they intend to change. This activity is called change-impact analysis. However, existing impact analysis suffers from two major problems: coarse...
Conference Paper
Full-text available
In the past decades, integrated development environments (IDEs) have been largely advanced to facilitate software engineering tasks and improve developer productivity. Yet, with growing information needs driven by increasing complexity in developing modern software with demands for high quality and reliability, developers often need to switch among...
Conference Paper
Full-text available
We present the design and implementation of TRACERJD, a toolkit devoted to dynamic dependence analysis via fine-grained whole-program dependence tracing. TRACERJD features a generic framework for efficient offline analysis of dynamic dependencies, including those due to exception-driven control flows. Underlying the framework is a hierarchical trac...
Conference Paper
Full-text available
Dynamic impact analysis can greatly assist developers with managing software changes by focusing their attention on the effects of potential changes relative to concrete program executions. While dependence-based dynamic impact analysis (DDIA) provides finer-grained results than traceability-based approaches, traditional DDIA techniques often produ...
Article
Full-text available
In the past decades, integrated development environments (IDEs) have been largely advanced to facilitate common software engineering tasks. Yet, with growing information needs driven by increasing complexity in developing modern high-quality software, developers often need to switch among multiple user interfaces, even across different applications...
Article
Full-text available
The correctness of software is affected by its constant changes. For that reason, developers use change-impact analysis to identify early the potential consequences of changing their software. Dynamic impact analysis is a practical technique that identifies potential impacts of changes for representative executions. However, it is unknown how relia...
Article
Full-text available
Software constantly changes during its life cycle. This phenomenon is particularly prominent in modern software, whose complexity keeps growing and changes rapidly in response to market pressures and user demands. At the same time, developers must assure the quality of this software in a timely manner. Therefore, it is of critical importance to pro...
Conference Paper
Full-text available
Sensitivity analysis determines how a system responds to stimuli variations, which can benefit important software-engineering tasks such as change-impact analysis. We present SENSA, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in pr...
Conference Paper
Full-text available
Impact analysis determines the effects that the behavior of program entities, or changes to them, can have on the rest of the system. Dynamic impact analysis is one practical form that computes smaller impact sets than static alternatives for concrete sets of executions. However, existing dynamic approaches can still produce impact sets that are to...
Conference Paper
Full-text available
Dynamic slicing is a practical and popular analysis technique used in various software-engineering tasks. Dynamic slicing is known to be incomplete because it analyzes only a subset of all possible executions of a program. However, it is less known that its results may inaccurately represent the dependencies that occur in those executions. Some res...
Conference Paper
Dynamic program slicing attempts to find runtime dependencies among statements to support security, reliability, and quality tasks such as information-flow analysis, testing, and debugging. However, it is not known how accurately dynamic slices identify statements that really affect each other. We propose a new approach to estimate the accuracy of...
Conference Paper
Full-text available
The reliability and security of software are affected by its constant changes. For that reason, developers use change-impact analysis early to identify the potential consequences of changing a program location. Dynamic impact analysis, in particular, identifies potential impacts on concrete, typical executions. However, the accuracy (precision and...