Danfeng Yao

Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States

Are you Danfeng Yao?

Claim your profile

Publications (59)6.1 Total impact

  • Xiaokui Shu, John Smiy, Danfeng Daphne Yao, Heshan Lin
    [Show abstract] [Hide abstract]
    ABSTRACT: Security log analysis is extremely useful for uncovering intrusions and anomalies. However, the sheer volume of log data demands new frameworks and techniques of computing and security. We present a lightweight distributed and parallel security log analysis framework that allows organizations to analyze a massive number of system, network, and transaction logs efficiently and scalably. Different from the general distributed frameworks, e.g., MapReduce, our framework is specifically designed for security log analysis. It features a minimum set of necessary properties, such as dynamic task scheduling for streaming logs. For prototyping, we implement our framework in Amazon cloud environments (EC2 and S3) with a basic analysis application. Our evaluation demonstrates the effectiveness of our design and shows the potential of our cloud-based distributed framework in large-scale log analysis scenarios.
    2013 IEEE Globecom Workshops (GC Wkshps); 12/2013
  • Kui Xu, P. Butler, S. Saha, Danfeng Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Attackers, in particular botnet controllers, use stealthy messaging systems to set up large-scale command and control. To systematically understand the potential capability of attackers, we investigate the feasibility of using domain name service (DNS) as a stealthy botnet command-and-control channel. We describe and quantitatively analyze several techniques that can be used to effectively hide malicious DNS activities at the network level. Our experimental evaluation makes use of a two-month-long 4.6-GB campus network data set and 1 million domain names obtained from alexa.com. We conclude that the DNS-based stealthy command-and-control channel (in particular, the codeword mode) can be very powerful for attackers, showing the need for further research by defenders in this direction. The statistical analysis of DNS payload as a countermeasure has practical limitations inhibiting its large-scale deployment.
    IEEE Transactions on Dependable and Secure Computing 01/2013; 10(3):143-153. · 1.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data protection in public cloud remains a challenging problem. Outsourced data processing on vulnerable cloud platforms may suffer from cross-VM attacks, e.g. side-channel attacks that leak secrecy keys. We design and develop CloudSafe, a general and practical data-protection solution by integrating cryptographic techniques and systematic mechanisms seamlessly to address this issue. CloudSafe first allows a data owner to outsource encrypted data in the cloud. It then employs a cloud-based proxy to re-encrypt stored encrypted data and delivers it to authorized cloud applications upon access requests. To combat cross-VM side-channel attacks, the final data decryption key is one-time use and can be retrieved from the data owner ondemand. Any key leakage after an authorized access cannot compromise data confidentiality. For data sharing, CloudSafe allows authorized applications to efficiently access the protected data. The prototype evaluation demonstrates the efficiency of the scheme towards large-scale cloud applications.
    Communications and Network Security (CNS), 2013 IEEE Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Malicious software typically resides stealthily on a user's computer and interacts with the user's computing resources. Our goal in this work is to improve the trustworthiness of a host and its system data. Specifically, we provide a new mechanism that ensures the correct origin or provenance of critical system information and prevents adversaries from utilizing host resources. We define data-provenance integrity as the security property stating that the source where a piece of data is generated cannot be spoofed or tampered with. We describe a cryptographic provenance verification approach for ensuring system properties and system-data integrity at kernel-level. Its two concrete applications are demonstrated in the keystroke integrity verification and malicious traffic detection. Specifically, we first design and implement an efficient cryptographic protocol that enforces keystroke integrity by utilizing on-chip Trusted Computing Platform (TPM). The protocol prevents the forgery of fake key events by malware under reasonable assumptions. Then, we demonstrate our provenance verification approach by realizing a lightweight framework for restricting outbound malware traffic. This traffic-monitoring framework helps identify network activities of stealthy malware, and lends itself to a powerful personal firewall for examining all outbound traffic of a host that cannot be bypassed.
    IEEE Transactions on Dependable and Secure Computing 05/2012; · 1.06 Impact Factor
  • Source
    Deian Stefan, Xiaokui Shu, Danfeng (Daphne) Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Biometric systems including keystroke-dynamics based authentication have been well studied in the literature. The attack model in biometrics typically considers impersonation attempts launched by human imposters. However, this attack model is not adequate, as advanced attackers may utilize programs to forge data. In this paper, we consider the effects of synthetic forgery attacks in the context of biometric authentication systems. Our study is performed in a concrete keystroke-dynamic authentication system.The main focus of our work is evaluating the security of keystroke-dynamics authentication against synthetic forgery attacks. Our analysis is performed in a remote authentication framework called TUBA that we design and implement for monitoring a user’s typing patterns. We evaluate the robustness of TUBA through experimental evaluation including two series of simulated bots. The keystroke sequences forged by the two bots are modeled using first-order Markov chains. Support vector machine is used for classification. Our results, based on 20 users’ keystroke data, are reported. Our work shows that keystroke dynamics is robust against the two specific types of synthetic forgery attacks studied, where attacker draws statistical samples from a pool of available keystroke dataset other than the target.We also describe TUBA’s use for detecting anomalous activities on remote hosts, and present its use in a specific cognition-based anomaly detection system. The use of TUBA provides high assurance on the information collected from the hosts and enables remote security diagnosis and monitoring.
    Computers & Security. 01/2012; 31:109-121.
  • IEEE Trans. Dependable Sec. Comput. 01/2012; 9:173-183.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent years have witnessed the trend of leveraging cloud-based services for large scale content storage, processing, and distribution. Security and privacy are among top concerns for the public cloud environments. Towards end-to-end content security, we propose and implement CloudSeal, a scheme for securely sharing and distributing content via the public cloud. CloudSeal ensures the confidentiality of content in the public cloud environments with flexible access control policies for subscribers and efficient content distribution via content delivery network. CloudSeal seamlessly integrates symmetric encryption, proxy-based re-encryption, k-out-of-n secret sharing, and broadcast revocation mechanisms. These algorithms allow CloudSeal to cache the major part of a stored cipher content object in the delivery network for content distribution, while keeping the minor part in the cloud storage for key management. The separation of subscription-based key management and confidentiality-oriented proxy-based re-encryption policies uniquely enables flexible and scalable deployment of the solution as well as strong security for cached content in the network. We have implemented CloudSeal on Amazon Web Services, including EC2, S3, and CloudFront. Through experimental evaluation, we demonstrate the end-to-end efficiency and scalability of CloudSeal.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.
    Network Protocols (ICNP), 2012 20th IEEE International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A major vector of computer infection is through exploiting vulnerable software or design flaws in networked applications such as the browser. Malicious code can be fetched and executed on a victim's machine without the user's permission, as in drive-by download (DBD) attacks. In this paper, we describe a new tool called DeWare (standing for Detection of Malware) for detecting the onset of infection delivered through vulnerable applications. DeWare enforces the dependencies between user actions and system events, such as file-system access and process execution. Our tool can be used to provide real time protection of a personal computer, as well as for diagnosing and evaluating untrusted websites for forensic purposes. Our solution demonstrates a usable host-based framework for controlling and enforcing the access of system resources. We perform extensive experimental evaluation, including a user study with 21 participants, thousands of legitimate websites (for testing false alarms), 84 malicious websites in the wild, as well as lab reproduced exploits. Our results show that DeWare is able to correctly distinguish legitimate download events from unauthorized system events with a low false positive rate (
    5th International Conference on Network and System Security, NSS 2011, Milan, Italy, September 6-8, 2011; 01/2011
  • Source
    Yipeng Wang, Zhibin Zhang, Danfeng (Daphne) Yao, Buyun Qu, Li Guo
    [Show abstract] [Hide abstract]
    ABSTRACT: Application-level protocol specifications (i.e., how a protocol should behave) are helpful for network security management, including intrusion detection and intrusion prevention. The knowledge of protocol specifications is also an effective way of detecting malicious code. However, current methods for obtaining unknown protocol specifications highly rely on manual operations, such as reverse engineering which is a major instrument for extracting application-level specifications but is time-consuming and laborious. Several works have focus their attentions on extracting protocol messages from real-world trace automatically, and leave protocol state machine unsolved. In this paper, we propose Veritas, a system that can automatically infer protocol state machine from real-world network traces. The main feature of Veritas is that it has no prior knowledge of protocol specifications, and our technique is based on the statistical analysis on the protocol formats. We also formally define a new model – probabilistic protocol state machine (P-PSM), which is a probabilistic generalization of protocol state machine. In our experiments, we evaluate a text-based protocol and two binary-based protocols to test the performance of Veritas. Our results show that the protocol state machines that Veritas infers can accurately represent 92% of the protocol flows on average. Our system is general and suitable for both text-based and binary-based protocols. Veritas can also be employed as an auxiliary tool for analyzing unknown behaviors in real-world applications.
    Applied Cryptography and Network Security - 9th International Conference, ACNS 2011, Nerja, Spain, June 7-10, 2011. Proceedings; 01/2011
  • Source
    Patrick Butler, Kui Xu, Danfeng (Daphne) Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Attackers in particular botnet controllers use stealthy messaging systems to set up large-scale command and control. Understanding the capacity of such communication channels is important in detecting organized cyber crimes. We analyze the use of domain name service (DNS) as a stealthy botnet command-and-control channel, which allows multiple entities to pass messages stored in DNS records to each other. We describe and quantitatively analyze new techniques that can be used to hide malicious DNS activities both at the host and network levels. We also present and experimentally evaluate statistical content-analysis techniques as a countermeasure, which require deep packet inspection. Our techniques are beyond the specific DNS security problem studied. We give a formal definition for the perfect stealth of a communication channel; point out the fundamental limits in achieving it, as well as the practical issues in the detection. We perform comprehensive statistical analysis that makes use of a two-month-long 4.6GB campus network dataset and 1 million domain names obtained from alexa.com .
    Applied Cryptography and Network Security - 9th International Conference, ACNS 2011, Nerja, Spain, June 7-10, 2011. Proceedings; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In open systems such as cloud computing platforms, delegation transfers privileges among users across different administrative domains and facilitates information sharing. We present an independently verifiable delegation mechanism, where a delegation credential can be verified without the participation of domain administrators. Our protocol, called role-based cascaded delegation (RBCD), supports simple and efficient cross-domain delegation of authority. RBCD enables a role member to create delegations based on the dynamic needs of collaboration; in the meantime, a delegation chain can be verified by anyone without the participation of role administrators. We also describe an efficient realization of RBCD by using aggregate signatures, where the authentication information for an arbitrarily long role-based delegation chain is captured by one short signature of constant size.
    IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 12/2010; · 2.18 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The fluid, urgent nature of crises requires flexible, responsive information sharing. Recent studies show, however, that in business catastrophes and other kinds of crises conventional access control mechanisms favor security over flexibility. Our work addresses these seemingly contradictory needs for security and flexibility and designs a trust inference model based on fuzzy logic, a model that can be used with pervasive computing technologies using sensors and mobile devices. Drawing upon research on trust, we design a trust inference model using attributes of affiliation, task performance, and urgency; apply the model to a known crisis; discuss implementation issues; and explore issues for further research.This article is dedicated to Alan Jarman, a founding influence in the Journal of Contingencies and Crisis Management who died in Canberra 15 July 2010. Alan's quantitative, engineering background and his long standing commitment to improving crisis decision making prompted him to encourage our applying fuzzy logic to crisis information sharing. We are grateful for Alan's encouragement and advice.
    Journal of Contingencies and Crisis Management 11/2010; 18(4):231 - 241.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Conventional network security solutions are performed on network-layer packets using statistical measures. These types of traffic analysis may not catch stealthy attacks carried out by today’s malware. We aim to develop a host-based security tool that identifies suspicious outbound network connections through analyzing the user’s surfing activities. Specifically, our solution for Web applications predicts user’s network connections by analyzing Web content; unpredicted traffic is further investigated with the user’s help. We describe our method and implementation as well as the experimental results in evaluating its efficiency and effectiveness. We describe how our studies can be applied to detecting bot infection. In order to assess the workload of our host-based traffic-analysis tool, we also perform a large-scale characterization study on 500 university-users’ wireless network traces for 4-month period. We study both the statistical and temporal patterns of individuals’ web usage behaviors from collected wireless network traces. Users are classified into different profiles based on their web usage patterns. Our results show that users have regularities in their Web activities and the expected workload of our traffic-analysis solution is low.
    01/2010: pages 293-307;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recommender systems are used to predict user preferences for products or services. In order to seek better prediction techniques, data owners of recommender systems such as Netflix sometimes make their customers' reviews available to the public, which raises serious privacy concerns. With only a small amount of knowledge about individuals and their ratings to some items in a recommender system, an adversary may easily identify the users and breach their privacy. Unfortunately, most of the existing privacy models (e.g., k-anonymity) cannot be directly applied to recommender systems. In this paper, we study the problem of privacy-preserving publishing of recommendation datasets. We represent recommendation data as a bipartite graph, and identify several attacks that can re-identify users and determine their item ratings. To deal with these attacks, we first give formal privacy definitions for recommendation data, and then develop a robust and efficient anonymization algorithm, Predictive Anonymization, to achieve our privacy goals. Our experimental results show that Predictive Anonymization can prevent the attacks with very little impact to prediction accuracy.
    Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2010, Beijing, China, April 13-16, 2010; 01/2010
  • Deian Stefan, Danfeng Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe the use of keystroke-dynamics patterns for authentication and detecting infected hosts, and evaluate its robustness against forgery attacks. Specifically, we present a remote authentication framework called TUBA for monitoring a user's typing patterns. We evaluate the robustness of TUBA through comprehensive experimental evaluation including two series of simulated bots. Support vector machine is used for classification. Our results based on 20 users' keystroke data are reported. Our work shows that keystroke dynamics is robust against synthetic forgery attacks studied, where attacker draws statistical samples from a pool of available keystroke datasets other than the target. TUBA is particularly suitable for detecting extrusion in organizations and protecting the integrity of hosts in collaborative environments, as well as authentication.
    The 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2010, Chicago, IL, USA, 9-12 October 2010; 01/2010
  • Kui Xu, Qiang Ma, Danfeng (Daphne) Yao
    [Show abstract] [Hide abstract]
    ABSTRACT: Software flaws in applications such as a browser may be exploited by attackers to launch drive-by-download (DBD), which has become the major vector of malware infection. We describe a host-based detection approach against DBDs by correlating the behaviors of human-user related to file systems. Our approach involves capturing keyboard and mouse inputs of a user, and correlating these input events to file-downloading events. We describe a real-time monitoring system called DeWare that is capable of accurately detecting the onset of malware infection by identifying the illegal download-and-execute patterns.
    Recent Advances in Intrusion Detection, 13th International Symposium, RAID 2010, Ottawa, Ontario, Canada, September 15-17, 2010. Proceedings; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sharing personal information and documents is pervasive in Web 2.0 environments, which creates the need for properly controlling shared data. Most existing authorization and policy management systems are for organizational use by IT professionals. Average Web users, however, do not have the sophistication to specify and maintain privacy policies for their shared content. In this paper, we aim to utilize personal and social annotations to develop automatic tools for managing content sharing, and demonstrate a new application of social annotations in access control. We use annotation data to predict privacy preferences of users and automatically derive policies for shared content. We carry out a series of user studies to evaluate the accuracy of our predicted techniques. We also perform extensive analysis on static and dynamic approaches of analyzing semantic similarities of tags, which is of independent interest. Our analysis gives encouraging results on the feasibility of using annotations for privacy management in Web 2.0.
    Collaborative Computing: Networking, Applications and Worksharing, 2009. CollaborateCom 2009. 5th International Conference on; 12/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper concerns the problem of identity management in mod- ern Web-2.0-based mashup applications. Identity management sup- ports convenient access to information when mashups are used in sensitive environments, such an banking, investment and online shopping, by providing services such as single sign-on. We present Web2ID, a new identity management protocol tai- lored for mashup applications. Web2ID leverages a secure mashup framework and enables transfer of credentials between a service provider and a consumer. We also describe a new relay framework in which communication between two service providers is medi- ated by a relay agent within the mashup. We show that Web2ID is privacy-preserving and prevents service providers from learning a user's surfing habits. We present an implementation of Web2ID and the relay frame- work using a JavaScript-based library that executes within the browser. Our implementation does not require client-side changes and is therefore fully compatible even with legacy browsers. We also highlight the key challenges faced in creating a portable, in- browser library to support identity management in mashups.
    Proceedings of the 5th Workshop on Digital Identity Management, Chicago, Illinois, USA, November 13, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A recent study found that the widely-used secret questions for Web authentication can easily be guessed. The study focused on making secret questions easier to remember for the user and harder to break by others. Our approach is authentication through the use of an individual's personal and dynamic Internet activities. We hypothesize that frequently-changing secret questions will be hard for attackers to guess. We propose three major categories of questions that are based off of user activities: network activities (e.g., browsing history, emails); physical events e.g., planned meetings, calendar items); conceptual opinions (e.g., opinions as derived from browsing, emails). Our preliminary results are encouraging and show that this new direction is promising. To improve the usability, in particular nonintrusiveness, of such a dynamic secret-question system, we also describe a concrete client-server architecture and security model for automating our authentication systems through utilizing existing artificial intelligent techniques.
    01/2009;

Publication Stats

318 Citations
6.10 Total Impact Points

Institutions

  • 2010
    • Virginia Polytechnic Institute and State University
      • Department of Computer Science
      Blacksburg, Virginia, United States
  • 2009–2010
    • Rutgers, The State University of New Jersey
      • Department of Computer Science
      New Brunswick, New Jersey, United States
  • 2007
    • FX Palo Alto Laboratory
      Palo Alto, California, United States
  • 2004–2007
    • Brown University
      • Department of Computer Science
      Providence, RI, United States