Danfeng Yao

Rutgers, The State University of New Jersey, New Brunswick, New Jersey, United States

Are you Danfeng Yao?

Claim your profile

Publications (75)20.13 Total impact

  • Source
    Hao Zhang · Danfeng (Daphne) Yao · Naren Ramakrishnanvt · Zhibin Zhangcas
    [Show abstract] [Hide abstract]
    ABSTRACT: Malicious software activities have become more and more clandestine, making them challenging to detect. Existing security solutions rely heavily on the recognition of known code or behavior signatures, which are incapable of detecting new malware patterns. We propose to discover the triggering relations on network requests and leverage the structural information to identify stealthy malware activities that cannot be attributed to a legitimate cause. The triggering relation is defined as the temporal and causal relationship between two events. We design and compare rule- and learning-based methods to infer the triggering relations on network data. We further introduce a user-intention based security policy for pinpointing stealthy malware activities based on a triggering relation graph. We extensively evaluate our solution on a DARPA dataset and 7 GB real-world network traffic. Results indicate that our dependence analysis successfully detects various malware activities including spyware, data exfiltrating malware, and DNS bots on hosts. With good scalability for large datasets, the learning-based method achieves better classification accuracy than the rule-based one. The significance of our traffic reasoning approach is its ability to detect new and stealthy malware activities.
    Preview · Article · Jan 2016 · Computers & Security
  • X. Shu · J. Zhang · D. Yao · W.-C. Feng
    [Show abstract] [Hide abstract]
    ABSTRACT: The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and inexact data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared with the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our data leak detection solution analyzing big data.
    No preview · Article · Aug 2015 · Proceedings - IEEE INFOCOM
  • Xiaokui Shu · Danfeng Yao · Elisa Bertino
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistics from security firms, research institutions and government organizations show that the number of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced. In this paper, we present a privacy-preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.
    No preview · Article · May 2015 · IEEE Transactions on Information Forensics and Security
  • Source
    Fang Liu · Xiaokui Shu · Danfeng Yao · Ali R. Butt
    [Show abstract] [Hide abstract]
    ABSTRACT: The exposure of sensitive data in storage and transmission poses a serious threat to organizational and personal secu- rity. Data leak detection aims at scanning content (in stor- age or transmission) for exposed sensitive data. Because of the large content and data volume, such a screening algo- rithm needs to be scalable for a timely detection. Our solu- tion uses the MapReduce framework for detecting exposed sensitive content, because it has the ability to arbitrarily scale and utilize public resources for the task, such as Ama- zon EC2. We design new MapReduce algorithms for com- puting collection intersection for data leak detection. Our prototype implemented with the Hadoop system achieves 225 Mbps analysis throughput with 24 nodes. Our algo- rithms support a useful privacy-preserving data transfor- mation. This transformation enables the privacy-preserving technique to minimize the exposure of sensitive data during the detection. This transformation supports the secure out- sourcing of the data leak detection to untrusted MapReduce and cloud providers.
    Preview · Conference Paper · Mar 2015
  • B. Wolfe · K. Elish · D. Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method of classifying previously unseen Android applications as malware or benign. The algorithm starts with a large set of features: the frequencies of all possible n-byte sequences in the application's byte code. Principal components analysis is applied to that frequency matrix in order to reduce it to a low-dimensional representation, which is then fed into any of several classification algorithms. We utilize the implicitly restarted Lanczos bidiagonalization algorithm and exploit the sparsity of the n-gram frequency matrix in order to efficiently compute the low-dimensional representation. When trained upon that low-dimensional representation, several classification algorithms achieve higher accuracy than previous work.
    No preview · Article · Feb 2015
  • Xiaokui Shu · Jing Zhang · Danfeng Daphne Yao · Wu-chun Feng
    [Show abstract] [Hide abstract]
    ABSTRACT: The leak of sensitive data on computer systems poses a serious threat to organizational security. Statistics show that the lack of proper encryption on files and communications due to human errors is one of the leading causes of data loss. Organizations need tools to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion and deletion) result in highly unpredictable leak patterns. In this paper, we utilize sequence alignment techniques for detecting complex data-leak patterns. Our algorithm is designed for detecting long and inexact sensitive data patterns. This detection is paired with a comparable sampling algorithm, which allows one to compare the similarity of two separately sampled sequences. Our system achieves good detection accuracy in recognizing transformed leaks. We implement a parallelized version of our algorithms in graphics processing unit that achieves high analysis throughput. We demonstrate the high multithreading scalability of our data leak detection method required by a sizable organization.
    No preview · Article · Jan 2015 · IEEE Transactions on Information Forensics and Security
  • Source
    Hao Zhang · Maoyuan Sun · Danfeng (Daphne) Yao · Chris North

    Full-text · Conference Paper · Jan 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Securing the networks of large organizations is technically challenging due to the complex configurations and constraints. Managing these networks requires rigorous and comprehensive analysis tools. A network administrator needs to identify vulnerable configurations, as well as tools for hardening the networks. Such networks usually have dynamic and fluidic structures, thus one may have incomplete information about the connectivity and availability of hosts. In this paper, we address the problem of statically performing a rigorous assessment of a set of network security defense strategies with the goal of reducing the probability of a successful large-scale attack in a dynamically changing and complex network architecture. We describe a probabilistic graph model and algorithms for analyzing the security of complex networks with the ultimate goal of reducing the probability of successful attacks. Our model naturally utilizes a scalable state-of-the-art optimization technique called sequential linear programming that is extensively applied and studied in various engineering problems. %We demonstrate their use in solving several types of useful network security management problems. Among them is the optimal placement problem, where the network administrator needs to compute on which machine(s) to install new security products in order to maximize the security benefit for the organizational network. In comparison to related solutions on attack graphs, our probabilistic model provides mechanisms for expressing uncertainties in network configurations, which is not reported elsewhere. We have performed comprehensive experimental validation with real-world network configuration data of a sizable organization.
    Full-text · Article · Jan 2015 · IEEE Transactions on Dependable and Secure Computing
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As mobile computing becomes an integral part of the modern user experience, malicious applications have infiltrated open marketplaces for mobile platforms. Malware apps stealthily launch operations to retrieve sensitive user or device data or abuse system resources. We describe a highly accurate classification approach for detecting malicious Android apps. Our method statically extracts a data-flow feature on how user inputs trigger sensitive API invocations, a property referred to as the user-trigger dependence. Our evaluation with 1433 malware apps and 2684 free popular apps gives a classification accuracy (2.1% false negative rate and 2.0% false positive rate) that is better than, or at least competitive against, the state-of-the-art. Our method also discovers new malicious apps in the Google Play market that cannot be detected by virus scanning tools. Our thesis in this mobile app classification work is to advocate the approach of benign property enforcement, i.e., extracting unique behavioral properties from benign programs and designing corresponding classification policies.
    Preview · Article · Nov 2014 · Computers & Security
  • Hao Zhang · Danfeng Daphne Yao · Naren Ramakrishnan
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies show that a significant portion of networked computers are infected with stealthy malware. Infection allows remote attackers to control, utilize, or spy on victim machines. Conventional signature-scan or counting-based techniques are limited, as they are unable to stop new zero-day exploits. We describe a traffic analysis method that can effectively detect malware activities on a host. Our new approach efficiently discovers the underlying triggering relations of a massive amount of network events. We use these triggering relations to reason the occurrences of network events and to pinpoint stealthy malware activities. We define a new problem of triggering relation discovery of network events. Our solution is based on domain-knowledge guided advanced learning algorithms. Our extensive experimental evaluation involving 6+ GB traffic of various types shows promising results on the accuracy of our triggering relation discovery.
    No preview · Article · Jun 2014
  • Source
    Hussain Almohri · Danfeng Yao · Dennis Kafura
    [Show abstract] [Hide abstract]
    ABSTRACT: Many Android vulnerabilities share a root cause of malicious unauthorized applications executing without user's consent. In this paper, we propose the use of a technique called process authentication for Android applications to overcome the shortcomings of current Android security practices. We demonstrate the process authentication model for Android by designing and implementing our runtime authentication and detection system referred to as DroidBarrier. Our malware analysis shows that DroidBarrier is capable of detecting real Android malware at the time of creating independent processes.
    Full-text · Conference Paper · Mar 2014
  • Source
    Hussain M. J. Almohri · Danfeng (Daphne) Yao · Dennis Kafura
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper points out the need in modern operating system kernels for a process authentication mechanism, where a process of a user-level application proves its identity to the kernel. Process authentication is different from process identification. Identification is a way to describe a principal; PIDs or process names are identifiers for processes in an OS environment. However, the information such as process names or executable paths that is conventionally used by OS to identify a process is not reliable. As a result, malware may impersonate other processes, thus violating system assurance. We propose a lightweight secure application authentication framework in which user-level applications are required to present proofs at runtime to be authenticated to the kernel. To demonstrate the application of process authentication, we develop a system call monitoring framework for preventing unauthorized use or access of system resources. It verifies the identity of processes before completing the requested system calls. We implement and evaluate a prototype of our monitoring architecture in Linux. The results from our extensive performance evaluation show that our prototype incurs reasonably low overhead, indicating the feasibility of our approach for cryptographically authenticating applications and their processes in the operating system.
    Full-text · Article · Mar 2014 · IEEE Transactions on Dependable and Secure Computing
  • Source
    Karim O. Elish · Yipan Deng · Danfeng (Daphne) Yao · Dennis Kafura
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe an effective device-based isolation approach for achieving data security. We show its use in protecting the secrecy of highly sensitive data that is crucial to security operations, such as cryptographic keys used for decrypting ciphertext or signing digital signatures. Private key is usually encrypted in its storage when not used; however, when being used, the plaintext key is loaded into the memory of the host for access. We present a novel and practical solution and its prototype called DataGuard to protect the secrecy of the highly sensitive data through the storage isolation and secure tunneling enabled by a mobile handheld device. DataGuard can be deployed for the key protection of individuals or organizations. We implement three prototypes and conduct extensive experiments to evaluate the feasibility and performance of DataGuard. The results show that our approach performs well without significant overhead.
    Preview · Article · Dec 2013 · Procedia Computer Science
  • Xiaokui Shu · John Smiy · Danfeng Daphne Yao · Heshan Lin
    [Show abstract] [Hide abstract]
    ABSTRACT: Security log analysis is extremely useful for uncovering intrusions and anomalies. However, the sheer volume of log data demands new frameworks and techniques of computing and security. We present a lightweight distributed and parallel security log analysis framework that allows organizations to analyze a massive number of system, network, and transaction logs efficiently and scalably. Different from the general distributed frameworks, e.g., MapReduce, our framework is specifically designed for security log analysis. It features a minimum set of necessary properties, such as dynamic task scheduling for streaming logs. For prototyping, we implement our framework in Amazon cloud environments (EC2 and S3) with a basic analysis application. Our evaluation demonstrates the effectiveness of our design and shows the potential of our cloud-based distributed framework in large-scale log analysis scenarios.
    No preview · Conference Paper · Dec 2013
  • Hao Zhang · Danfeng Daphne Yao · Naren Ramakrishnan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the problem of reasoning about relations between network packets on a host or in a network. Our analysis approach is to discover the causal relations among network packets, and use the relational structure of network events to identify anomalous activities that cannot be attributed to a legitimate cause. The key insight that motivates our traffic-analysis approach is that higher-order information such as the underlying relations of events is useful for human experts' cognition and decision making. We design a new pairing method that produces special pairwise features, so that the discovery problem can be efficiently solved with existing binary classification methods. Preliminary experiments involving real world HTTP and DNS traffic show promising evidence of the accuracy of inferring the network traffic relations using our semantic-aware approach.
    No preview · Conference Paper · Nov 2013
  • Huijun Xiong · Qingji Zheng · Xinwen Zhang · Danfeng Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Data protection in public cloud remains a challenging problem. Outsourced data processing on vulnerable cloud platforms may suffer from cross-VM attacks, e.g. side-channel attacks that leak secrecy keys. We design and develop CloudSafe, a general and practical data-protection solution by integrating cryptographic techniques and systematic mechanisms seamlessly to address this issue. CloudSafe first allows a data owner to outsource encrypted data in the cloud. It then employs a cloud-based proxy to re-encrypt stored encrypted data and delivers it to authorized cloud applications upon access requests. To combat cross-VM side-channel attacks, the final data decryption key is one-time use and can be retrieved from the data owner ondemand. Any key leakage after an authorized access cannot compromise data confidentiality. For data sharing, CloudSafe allows authorized applications to efficiently access the protected data. The prototype evaluation demonstrates the efficiency of the scheme towards large-scale cloud applications.
    No preview · Conference Paper · Oct 2013
  • Kui Xu · Patrick Butler · Sudip Saha · Danfeng (Daphne) Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Attackers, in particular botnet controllers, use stealthy messaging systems to set up large-scale command and control. To systematically understand the potential capability of attackers, we investigate the feasibility of using domain name service (DNS) as a stealthy botnet command-and-control channel. We describe and quantitatively analyze several techniques that can be used to effectively hide malicious DNS activities at the network level. Our experimental evaluation makes use of a two-month-long 4.6-GB campus network data set and 1 million domain names obtained from alexa.com. We conclude that the DNS-based stealthy command-and-control channel (in particular, the codeword mode) can be very powerful for attackers, showing the need for further research by defenders in this direction. The statistical analysis of DNS payload as a countermeasure has practical limitations inhibiting its large-scale deployment.
    No preview · Article · May 2013 · IEEE Transactions on Dependable and Secure Computing
  • Xiaokui Shu · Danfeng (Daphne) Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint detection – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others (e.g., network or cloud providers). We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on our techniques with large datasets. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed.
    No preview · Article · Sep 2012
  • Kui Xu · Huijun Xiong · Chehai Wu · Deian Stefan · Danfeng Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Malicious software typically resides stealthily on a user's computer and interacts with the user's computing resources. Our goal in this work is to improve the trustworthiness of a host and its system data. Specifically, we provide a new mechanism that ensures the correct origin or provenance of critical system information and prevents adversaries from utilizing host resources. We define data-provenance integrity as the security property stating that the source where a piece of data is generated cannot be spoofed or tampered with. We describe a cryptographic provenance verification approach for ensuring system properties and system-data integrity at kernel-level. Its two concrete applications are demonstrated in the keystroke integrity verification and malicious traffic detection. Specifically, we first design and implement an efficient cryptographic protocol that enforces keystroke integrity by utilizing on-chip Trusted Computing Platform (TPM). The protocol prevents the forgery of fake key events by malware under reasonable assumptions. Then, we demonstrate our provenance verification approach by realizing a lightweight framework for restricting outbound malware traffic. This traffic-monitoring framework helps identify network activities of stealthy malware, and lends itself to a powerful personal firewall for examining all outbound traffic of a host that cannot be bypassed.
    No preview · Article · May 2012 · IEEE Transactions on Dependable and Secure Computing
  • Source
    Deian Stefan · Xiaokui Shu · Danfeng (Daphne) Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Biometric systems including keystroke-dynamics based authentication have been well studied in the literature. The attack model in biometrics typically considers impersonation attempts launched by human imposters. However, this attack model is not adequate, as advanced attackers may utilize programs to forge data. In this paper, we consider the effects of synthetic forgery attacks in the context of biometric authentication systems. Our study is performed in a concrete keystroke-dynamic authentication system.The main focus of our work is evaluating the security of keystroke-dynamics authentication against synthetic forgery attacks. Our analysis is performed in a remote authentication framework called TUBA that we design and implement for monitoring a user’s typing patterns. We evaluate the robustness of TUBA through experimental evaluation including two series of simulated bots. The keystroke sequences forged by the two bots are modeled using first-order Markov chains. Support vector machine is used for classification. Our results, based on 20 users’ keystroke data, are reported. Our work shows that keystroke dynamics is robust against the two specific types of synthetic forgery attacks studied, where attacker draws statistical samples from a pool of available keystroke dataset other than the target.We also describe TUBA’s use for detecting anomalous activities on remote hosts, and present its use in a specific cognition-based anomaly detection system. The use of TUBA provides high assurance on the information collected from the hosts and enables remote security diagnosis and monitoring.
    Preview · Article · Feb 2012 · Computers & Security

Publication Stats

564 Citations
20.13 Total Impact Points

Institutions

  • 2008-2012
    • Rutgers, The State University of New Jersey
      • Department of Computer Science
      New Brunswick, New Jersey, United States
  • 2010
    • Virginia Polytechnic Institute and State University
      • Department of Computer Science
      Блэксбург, Virginia, United States
  • 2007
    • FX Palo Alto Laboratory
      Palo Alto, California, United States
  • 2004-2007
    • Brown University
      • Department of Computer Science
      Providence, Rhode Island, United States