Salvatore J. StolfoColumbia University | CU · Department of Computer Science
Salvatore J. Stolfo
PhD Computer Science
About
325
Publications
137,797
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,212
Citations
Introduction
See www.cs.columbia.edu/~sal and www.cs.columbia.edu/ids and Wikipedia entry for Salvatore J Stolfo.
Additional affiliations
July 1979 - December 2012
July 1974 - June 1979
Courant Institute, NYU
Position
- NYU Courant Institute
Description
- PhD Student
Education
July 1974 - August 1979
Publications
Publications (325)
This paper proposes the CorpRank algorithm to extract social hierarchies from electronic communication data. The algorithm computes a ranking score for each user as a weighted combination of the number of emails, the number of responses, average response time, clique scores, and several degree and centrality measures. The algorithm uses principal c...
Data theft is a growing threat to consumers and organizations which existing security safeguards do not sufficiently address. In particular, existing authentication mechanisms are frequently bypassed or circumvented although in situations where attacks are launched by malicious insiders who already possess valid credentials. We propose methods to e...
This article presents a security review of energy management systems and makes the first case for security-aware energy management. The authors demonstrate how contemporary energy management systems can be misused to induce faults that break security on modern Android phones.
Network traffic, including the headers, payloads, and traffic patterns, is an extremely rich source for anomaly detection. Network anomaly detection complements the system-level traces described in previous chapters. Vigna’s overview of network intrusion detection research in 2010 pointed out the rise of anomaly-based detection on network traffic u...
In this chapter, we discuss the unique security challenges in cyber-physical systems (CPS) and highlight the event awareness enhancement for CPS anomaly detection [53]. We take the finitestate automaton (FSA) model as an instance, and present the event-aware FSA model, named eFSA, to detect stealthy anomalous CPS program behaviors particularly caus...
In the previous chapters, we have described the technical challenges and state-of-the-art solutions for anomaly detection. We have covered various key aspects, including the evolution of anomaly detection, its fundamental limitations, new data-oriented threats, the challenge of diverse program behaviors, the role of program analysis in data science...
Defining the threat model is a key step in anomaly detection. It conveys the desired security guarantees of an anomaly detection solution. A threat model describes the adversary’s capabilities, e.g., attempting to bypass the password authentication and access a sensitive file. Threat model also clarifies the necessary security assumptions and attac...
Leveraging the insights obtained from program analysis can greatly improve the quality of modeling. In this chapter, we illustrate this point by describing a couple of examples that use static program analysis in machine learning algorithms. The related description on using static dependency analysis for securing control programs in cyber-physical...
A key to deploying anomaly detection as a service is automation. Automation is necessary for virtually every step of the anomaly detection, including selecting detection algorithms, tracing and data collection, training, classification, and model adjustment. In comparison to anti-virus scanning, automating anomaly detection is much more challenging...
Throughout the book, unless specified otherwise, we will focus on two detection scenarios: (i) one-class labeled semisupervised learning, where the training data is assumed to be free of adversarial contaminations (e.g., [249]); or (ii) unsupervised learning, where the training data may contain detectable noise, e.g., training samples may include o...
Anomaly detection is attractive in that the technology is not limited to existing knowledge of intrusions. However, practicality issues of anomaly detection algorithms and systems hinder the adoption in industry. Anomaly detection components in commercial security products are relatively new, which is in clear contrast with the much more mature sig...
Modern applications and Operating Systems vary greatly with respect to how they register and identify different types of content. These discrepancies lead to exploits and inconsistencies in user experience. In this paper, we highlight the issues arising in the modern content handling ecosystem, and examine how the operating system can be used to ac...
The need for power-and energy-efficient computing has resulted in aggressive cooperative hardware-software energy management mechanisms on modern commodity devices. Most systems today, for example, allow software to control the frequency and voltage of the underlying hardware at a very fine granularity to extend battery life. Despite their benefits...
Vulnerabilities that disclose executable memory pages enable a new class of powerful code reuse attacks that build the attack payload at runtime. In this work, we present Heisenbyte, a system to protect against memory disclosure attacks. Central to Heisenbyte is the concept of destructive code reads -- code is garbled right after it is read. Garbli...
Machine to Machine (M2M) systems are actively spreading, with mobile networks rapidly evolving to provide connectivity beyond smartphones and tablets. With billions of embedded devices expected to join cellular networks over the next few years, novel applications are emerging and contributing to the Internet of Things (IoT) paradigm. The new genera...
Organizations face a persistent challenge detecting malicious insiders as well as outside attackers who compromise legitimate credentials and then masquerade as insiders. No matter how good an organization's perimeter defenses are, eventually they will be compromised or betrayed from the inside. Monitored decoy documents (honey files with enticing...
Methods, systems, and media for providing trap-based defenses are provided. In accordance with some embodiments, a method for providing trap-based defenses is provided, the method comprising: generating decoy information based at least in part on actual information in a computing environment, wherein the decoy information is generated to comply wit...
Systems, methods, and media for outputting data based on anomaly detection are provided. In some embodiments, a method for outputting data based on anomaly detection is provided, the method comprising: receiving, using a hardware processor, an input dataset; identifying grams in the input dataset that substantially include distinct byte values; cre...
A system and methods of detecting an occurrence of a violation of an email security policy of a computer system. A model relating to the transmission of prior emails through the computer system is defined which is derived from statistics relating to the prior emails. For selected emails to be analyzed, statistics concerning the selected email are g...
Enterprises are increasingly moving their IT infrastructures to the Cloud, driven by the promise of low-cost access to ready-to-use, elastic resources. Given the heterogeneous and dynamic nature of enterprise IT environments, a rapid and accurate discovery of complex infrastructure dependencies at the application, middleware, and network level is k...
A system and methods for detecting intrusions in the operation of a computer system comprises a sensor configured to gather information regarding the operation of the computer system, to format the information in a data record having a predetermined format, and to transmit the data in the predetermined data format. A data warehouse is configured to...
Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic...
Defense in depth is vital as no single security product detects all of today’s attacks. To design defense in depth organizations rely on best practices and isolated product reviews with no way to determine the marginal benefit of additional security products. We propose empirically testing security products’ detection rates by linking multiple piec...
Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time co...
Methods, media, and systems for detecting attack are provided. In some embodiments, the methods include: comparing at least part of a document to a static detection model; determining whether attacking code is included in the document based on the comparison of the document to the static detection model; executing at least part of the document; det...
Methods, systems, and media for masquerade attack detection by monitoring computer user behavior are provided. In accordance with some embodiments, a method for detecting masquerade attacks is provided, the method comprising: monitoring a first plurality of user actions and access of decoy information in a computing environment; generating a user i...
Recent works have shown promise in using microarchitectural execution
patterns to detect malware programs. These detectors belong to a class of
detectors known as signature-based detectors as they catch malware by comparing
a program's execution pattern (signature) to execution patterns of known
malware programs. In this work, we propose a new clas...
Systems and methods provide an alert correlator and an alert distributor that enable early signs of an attack to be detected and rapidly disseminated to collaborating systems. The alert correlator utilizes data structures to correlate alert detections and provide a mechanism through which threat information can be revealed to other collaborating sy...
A method for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. Data elements are mapped to a feature space which is typically a vector space . Anomalies are detected by determining which points lies in sparse regions of the feature space. Two feature maps are used for mapping data elements to a featur...
The proliferation of computers in any domain is followed by the proliferation of malware in that domain. Systems, including the latest mobile platforms, are laden with viruses, rootkits, spyware, adware and other classes of malware. Despite the existence of anti-virus software, malware threats persist and are growing as there exist a myriad of ways...
A content extraction process may parse markup language text into a hierarchical data model and then apply one or more filters. Output filters may be used to make the process more versatile. The operation of the content extraction process and the one or more filters may be controlled by one or more settings set by a user, or automatically by a class...
We propose a machine learning-based method for biometric identification of user behavior, for the purpose of masquerade and insider threat detection. We designed a sensor that captures system-level events such as process creation, registry key changes, and file system actions. These measurements are used to represent a user's unique behavior profil...
Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and generating anomaly detection models are provided. In some embodiments, methods for generating sanitized data are provided. The methods including: dividing a first training dataset comprised of a plurality of training data items into a plurality of da...
The ability to update firmware is a feature that is found in nearly all modern embedded systems. We demonstrate how this feature can be exploited to allow attackers to inject ma-licious firmware modifications into vulnerable embedded devices. We discuss techniques for exploiting such vulnerable functionality and the implementation of a proof of con...
This book constitutes the proceedings of the 16th International Symposium on Research in Attacks, Intrusions and Defenses, former Recent Advances in Intrusion Detection, RAID 2013, held in Rodney Bay, St. Lucia in October 2013.
The volume contains 22 full papers that were carefully reviewed and selected from 95 submissions, as well as 10 poster pap...
Real-world applications commonly require untrusting parties to share sensitive information securely. This article describes a secure anonymous database search (SADS) system that provides exact keyword match capability. Using a new reroutable encryption and the ideas of Bloom filters and deterministic encryption, SADS lets multiple parties efficient...
ABSTRACTA masquerade attack is a consequence of identity theft. In such attacks, the impostor impersonates a legitimate insider while performing illegitimate activities. These attacks are very hard to detect and can cause considerable damage to an organization. Prior work has focused on user command modeling to identify abnormal behavior indicative...
Profiling means making predictions about likely user behavior based on collected characteristics and activities. Shari Lawrence Pfleeger and Marc Rogers brought together a group of researchers from a variety of disciplines to discuss whether profiling and prediction actually make us secure.
MEERKATS is a novel architecture for cloud environments that elevates continuous system evolution and change as first-rate design principles. Our goal is to enable an environment for cloud services that constantly changes along several dimensions, toward creating an unpredictable target for an adversary. This unpredictability will both impede the a...
Cloud computing promises to significantly change the way we use computers and access and store our personal and busi-ness information. With these new computing and communica-tions paradigms arise new data security challenges. Existing data protection mechanisms such as encryption have failed in preventing data theft attacks, especially those perpet...
Detecting insider attacks continues to prove to be one of the most difficult challenges in securing sensitive data. Decoy information and documents represent a promising approach to detecting malicious masqueraders; however, false positives can interfere with legitimate work and take up user time. We propose generating foreign language decoy docume...
Just as errors in sequential programs can lead to se-curity exploits, errors in concurrent programs can lead to concurrency attacks. Questions such as whether these at-tacks are real and what characteristics they have remain largely unknown. In this paper, we present a preliminary study of concurrency attacks and the security implica-tions of real...
Decoy technology and the use of deception are useful in securing critical computing systems by confounding and confusing adver-saries with fake information. Deception leverages uncertainty forc-ing adversaries to expend considerable effort to differentiate real-istic useful information from purposely planted false information. In this paper, we pro...
Cloud computing promises to significantly change the way we use computers and access and store our personal and business information. With these new computing and communications paradigms arise new data security challenges. Existing data protection mechanisms such as encryption have failed in preventing data theft attacks, especially those perpetra...
We propose a novel trap-based architecture for detecting passive, “silent”, attackers who are eavesdropping on enterprise networks. Motivated by the increasing number of incidents where attackers sniff the local network for interesting information, such as credit card numbers, account credentials, and passwords, we introduce a methodology for build...
Masquerade attacks are characterized by an adversary stealing a legitimate user's credentials and us-ing them to impersonate the victim and perform malicious activities, such as stealing information. Prior work on masquerade attack detection has focused on profiling legitimate user behavior and detecting abnormal behavior indicative of a masquerade...
Attackers continually innovate and craft attacks that penetrate existing defenses. New security product purchasing decisions are key in order to keep orga-nizations as secure as possible. Current information available to inform these decisions is often limited to individual security product detection/blocking rates for some test set of attacks. Act...
Our global communication infrastructures are powered by large numbers of legacy embedded devices. Recent advances in offensive technologies targeting embedded systems have shown that the stealthy exploitation of high-value embedded devices such as router and firewalls is indeed feasible. However, little to no host-based defensive technology is avai...
This paper investigates new methods to measure, quantify and evaluate the security posture of human organi-zations especially within large corporations and government agencies. Computer security is not just about technology and systems. It is also about the people that use those systems and how their vulnerable behaviors can lead to exploitation. W...
Modern network security research has demonstrated a clear necessity for open sharing of traffic datasets between organizations a need that has so far been superseded by the challenges of removing sensitive con-tent from the data beforehand. Network Data Anonymization is an emerg-ing field dedicated to solving this problem, with a main focus on remo...
A large number of embedded devices on the internet, such as routers and VOIP phones, are typically ripe for exploitation. Little to no defensive technology, such as AV scanners or IDS’s, are available to protect these devices.We propose a host-based defense mechanism, which we call Symbiotic Embedded Machines (SEM), that is specifically designed to...
Masquerade attacks are a common security problem that is a consequence of identity theft. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own file system well enough to search in a limited, targeted and unique fashion in order to...
Web applications have emerged as the primary means of access to vital and sensitive services such as online payment systems and databases storing personally identifiable information. Unfortunately, the need for ubiquitous and often anonymous access exposes web servers to adversaries. Indeed, network-borne zero-day attacks pose a critical and widesp...
If we wish to break the continual cycle of patching and replacing our core monoculture systems to defend against attacker
evasion tactics, we must redesign the way systems are deployed so that the attacker can no longer glean the information about
one system that allows attacking any other like system. Hence, a new poly-culture architecture that pr...
Purpose
IOS firmware diversity, the unintended consequence of a complex firmware compilation process, has historically made reliable exploitation of Cisco routers difficult. With approximately 300,000 unique IOS images in existence, a new class of version‐agnostic shellcode is needed in order to make the large‐scale exploitation of Cisco IOS possib...
Real-world data collection poses an important challenge in the security field. Insider and masquerader attack data collection poses even a greater challenge. Very few organizations acknowledge such breaches because of liability concerns and potential implications on their market value. This caused the scarcity of real-world data sets that could be...
Masquerade attacks pose a grave security problem that is a consequence of identity theft. Detecting masqueraders is very hard.
Prior work has focused on profiling legitimate user behavior and detecting deviations from that normal behavior that could
potentially signal an ongoing masquerade attack. Such approaches suffer from high false positive rat...
The field of computer and communications security begs for a foundational science to guide system design and to reveal the safety, security, and possible fragility of the complex systems we depend on today. To achieve this goal, we must devise suitable metrics for objectively comparing and evaluating the security of system designs and organizations...
We present MINESTRONE, a novel architecture that integrates static analysis, dynamic confinement, and code diversification techniques to enable the identification, mitigation and containment of a large class of software vulnerabilities in third-party software. Our initial focus is on software written in C and C++; however, many of our techniques ar...
This paper describes the SPARCHS project at Columbia and Princeton Universities. Drawing inspiration from biological defenses, this project aims to enhance security with clean-slate design of hardware. The ideas to be explored in the project and current status are described.
We're a long way from establishing a science of security comparable to the traditional physical sciences, and even from knowing whether such a goal is even achievable. Nevertheless, the articles in this special issue hint at the possibility and promise of foundational approaches to security.
We present important lessons learned from the engineering and operation of a large-scale embedded device vulnerability scanner infrastructure. Developed and refined over the period of one year, our vulnerability scanner monitored large portions of the Internet and was able to identify over 1.1 million publicly accessible trivially vulnerable embedd...
We introduce BotSwindler, a bait injection system designed to de- lude and detect crimeware by forcing it to reveal during the exploitation of moni- tored information. The implementation of BotSwindler relies upon an out-of-host software agent that drives user-like interactions in a virtual machine, seeking to convince malware residing within the g...
We present our initial experimental findings from the collaborative deployment of network Anomaly Detection (AD) sensors. Our system examines the ingress http traffic and correlates AD alerts from two administratively disjoint domains: Columbia University and George Mason University. We show that, by exchanging packet content alerts between the two...
Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Current trends demonstrate an increasing use of polymorphism by attackers to disguise their exploits. The ability for malicious code to be easily, and automatically, transformed into semantically equivalent variants frustrates attempts to construct simple, easily verifiable representations for use in security sensors. In this paper, we present a qu...
In this chapter, we propose a design for an insider threat detection system that combines an array of complementary techniques
that aims to detect evasive adversaries. We are motivated by real world incidents and our experience with building isolated
detectors: such standalone mechanisms are often easily identified and avoided by malefactors. Our w...
Privacy-preserving sharing of sensitive information (PPSSI) is motivated by the increasing need for entities (organizations or individuals) that don't fully trust each other to share sensitive information. Many types of entities need to collect, analyze, and disseminate data rapidly and accurately, without exposing sensitive information to unauthor...
A masquerade attack is a consequence of identity theft. In such attacks, the impostor impersonates a legitimate insider while performing illegitimate activities. These attacks are very hard to detect and can cause considerable damage to an organization. Prior work has focused on user command modeling to identify abnormal behavior indicative of impe...