Conference Paper

CANARY FILES: GENERATING FAKE FILES TO DETECT CRITICAL DATA LOSS FROM COMPLEX COMPUTER NETWORKS

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This paper introduces two concepts: Canary Files and a Canary File management system. A Canary File is a fake computer document that is placed amongst real documents in order to aid in the early detection of unauthorised data access, copying or modification. The name originates from canaries, which were used within coalmines as an early warning to miners. This paper also introduces the Serinus System, a Canary File management system designed to address some of the key challenges associated with operating a cyber deception capability. The Serinus System automates Canary Files generation using content and file statistics drawn from three sources: (1) Internet harvested documents, (2) documents collected from across the entire enterprise environment, and (3) documents within the specific target directory. Each data source is allocated a weighting based on the strength of their relationship to the target directory. The weighting is seeded with a random value to avoid discovery by simple statistical based fake file detection systems. Research is continuing to assess the performance of both Canary Files and the Serinus System.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Honeyfiles [35], also referred to as honeytokens [26], digital decoys [15], decoy files [9], and canary files [31], is a cyber deception approach that has the potential to assist in the detection of data exfiltration and unauthorised access. Honeyfiles perform this role by emulating 'real' documents in order to lure and bait data thieves. ...
... Unlike honeypots, honeyfiles do not need dedicated hardware, nor expose additional software vulnerabilities to exploitation [31]. Honeyfiles can also be placed directly within document repositories, amongst the files that require protection, rather than isolated on different network hosts, collision domains, and/or network segments [28]. ...
... if the signature matches then 6: call the hidden_interface function 7: return 8: if flag r is set to this file then 9: send a report 10: call the original read system call 11: procedure WRITE 12: ...
Article
It has been demonstrated that deception technologies are effective in detecting advanced persistent threats and zero-day attacks which cannot be detected by traditional signature-based intrusion detection techniques. Especially, a file-based deception technology is promising because it is very difficult (if not impossible) to commit an attack without reading and modifying any file. It can play as an additional security barrier because malicious file access can be detected even if an adversary succeeds in gaining access to a host. However, PhantomFS still has a problem that is common to deception technologies. Once a deception technology is known to adversaries, it is unlikely to succeed in alluring adversaries. In this paper, we classify adversaries who are aware of PhantomFS according to their knowledge level and permission of PhantomFS. Then we analyze the attack surface and develop a defense strategy to limit the attack vectors. We extend PhantomFS to realize the strategy. Specifically, we introduce multiple hidden interfaces and detection of file execution. We evaluate the security and performance overhead of the proposed technique. We demonstrate that the extended PhantomFS is secure against intelligent adversaries by penetration testing. The extended PhantomFS offers higher detection accuracy with lower false alarm rate compared to existing techniques. It is also demonstrated that the overhead is negligible in terms of response time and CPU time.
... Such techniques should be transparent to users and concealed from adversaries, concurrently limiting attacker gains and increasing the costs of their actions. Promising avenues of research include shielding cyberspace sensors from attackers [27,70] and exploring inherent asymmetric advantages of using deception in information warfare [45,71,83,89,90]. ...
Conference Paper
Full-text available
Vulnerability patch management remains one of the most complex issues facing modern enterprises; companies struggle to test and deploy new patches across their networks, often leaving myriad attack vectors vulnerable to exploits. This problem is exacerbated by enterprise server applications, which expose tremendous amounts of information about their security postures, greatly expediting attackers' reconnaissance incursions (e.g., knowledge gathering attacks). Unfortunately, current patching processes offer no insights into attacker activities, and prompt attack remediation is hindered by patch compatibility considerations and deployment cycles. To reverse this asymmetry, a patch management model is proposed to facilitate the rapid injection of software patches into live, commodity applications without disruption of production work-flows, and the transparent sandboxing of suspicious processes for counterreconnaissance and threat information gathering. Our techniques improve workload visibility and vulnerability management , and overcome perennial shortcomings of traditional patching methodologies, such as proneness to attacker fingerprinting, and the high cost of deployment. The approach enables a large variety of novel defense scenarios, including rapid security patch testing with prompt recovery from defective patches and the placement of exploit sensors inlined into production workloads. An implementation for six enterprise-grade server programs demonstrates that our approach is practical and incurs minimal runtime overheads. Moreover , four use cases are discussed, including a practical deployment on two public cloud environments.
... When an access to a honeyfile takes place, the honey-file system raises an alert and also adds an entry to a log file to record the access. Honeyfiles were originally intended to be stored on a honeypot, but were later adapted for deployment on computers in production [32]. Another form of decoy data are decoy database records [33]. ...
Article
Full-text available
We present an anti-malware solution that is able to reliably detect Object Linking and Embedding for Process Control (OPC) malware on machines in production. Detection is attained on the very first encounter with OPC malware and hence without any prior knowledge of their code and data. We architected the integration of a decoy network interface controller (DNIC) with a layer of kernel code that emulates a target OPC machine. A DNIC displays a (nonexistent) network, which the compromised machine appears to be connected to. OPC emulation displays a valid (but nonexistent) target OPC machine, which appears to be reachable from the compromised machine over the (nonexistent) network. Our code intercepts OPC malware during their search for target machines over the network. Its overall architecture is crafted to validate infection by leveraging OPC protocol mechanics. The same principles of operation are used to recognize goodware that access a DNIC by accident. Safe co-existence with production functions and real I/O devices is ensured by a monitor filter driver, which removes all decoy data bound for the monitor. We tested our DNIC architectural developments against numerous OPC malware samples involved in the Dragonfly cyber espionage campaign, and discuss the findings in the paper.
... Software can then analyze traffic to and from the files with the assumption that any activity is malicious in nature, as those files are not in use by legitimate applications. This concept of a Canary File, or a honeyfile placed amongst real files on a system, was proposed by Whitman [19], and he also discusses the automatic generation and management of the Canary Files. However, the content of automatically generated files is difficult to present as authentic upon inspection, and malware that resides at the operating system level is able to examine file access patterns to ignore files that are not being used. ...
Conference Paper
Full-text available
Commercial anti-malware systems currently rely on signatures or patterns learned from samples of known malware, and are unable to detect zero-day malware, rendering computers unprotected. In this paper we present a novel kernel-level technique of detecting keyloggers. Our approach operates through the use of a decoy keyboard. It uses a low-level driver to emulate and expose keystrokes modeled after actual users. We developed a statistical model of the typing profiles of real users, which regulates the times of delivery of emulated keystrokes. A kernel filter driver enables the decoy keyboard to shadow the physical keyboard, such as one single keyboard appears on the device tree at all times. That keyboard is the physical keyboard when the actual user types on it, and the decoy keyboard during time windows of user inactivity. Malware are detected in a second order fashion when data leaked by the decoy keyboard are used to access resources on the compromised machine. We tested our approach against live malware samples that we obtained from public repositories, and report the findings in the paper. The decoy keyboard is able to detect 0-day malware, and can co-exist with a real keyboard on a computer in production without causing any disruptions to the user’s work.
... When an access to a honeyfile takes place, the honey-file system raises an alert and also adds an entry to a log file to record the access. Honeyfiles were originally intended to be stored on a honeypot, but were later adapted for deployment on computers in production [36]. Another form of decoy data are decoy database records [35], [6]. ...
Conference Paper
Webcams are commonly used by advanced malware to spy on computer users. Because of several effective techniques that can suppress the turning on of a webcam LED, the activation of a webcam by malware is not visually detectable. Victims are silently filmed without their knowledge for extended periods of time. Recent attack trends show that webcam video covertly recorded by malware is used beyond the boundaries of the cyber domain, and thus is combined with human factors. An example is the Delilah malware. Delilah lurks on the compromised machine of a target user, using the webcam to capture details about family, work, social connections, and any other element involved in the life of that target user. The attackers then blackmail the target user with the goal of turning that person into an insider threat to his/her employer. The attackers ask the victim to give them industrial secrets in return for not disclosing video that is highly sensitive to him/her. In this paper we discuss an approach that enables the defender to sustain prolonged interaction with attackers for defensive and forensics purposes. The approach uses a decoy webcam on machines in production. It relies on a decoy video traffic injector module, as well as on the learning of the operational dynamics of real webcams. A webcam shadowing mechanism alternates between the real webcam and the decoy webcam. That mechanism causes malware to target the decoy webcam, but still enables the user to only see and hence use the real webcam. The approach can feed decoy webcam traffic into the data stream that malware intercepts and sends to attackers. The decoy webcam is robust to probes, and is able to coexist with production functions.
... When a honeyfile is accessed, the honeyfile system raises an alert and also logs the file access. An idea, which was explored by Whitham, is to place a honeyfile amongst real files in a computer system in production rather than on a honeypot [17]. The development and monitoring of honeytoken portions of a database file, and thus the insertion of decoy records in a database table in production, was proposed by White and Thompson in [12], and by Cenys et al. in [18]. ...
Article
Full-text available
The paper describes an OS-resident defensive deception approach, which can neutralize malware that has managed to infect a target machine. Such attacks count for most of the spying operations detected to date, and include malware, insider code, and trojans that originate from compromises of the computer supply chain. The central idea that underpins this work is to display the existence of I/O devices on a computer system. While those I/O devices would not exist for real, their projection will make them appear as valid targets of interception and malicious modification, or as valid means of propagation to other target computers. We experiment with the implementation of a low-level network driver for the Windows operating system. The network driver emulates the operation of a network interface controller (NIC), and thus reports to higher-level drivers in the network stack as if the NIC were existent, fully functional, and with access to an existing computer network. We tested and evaluated NIC displays against a large sample of live malware, and thus discuss our findings in the paper.
Chapter
This paper discusses a technical solution that will help to bring the cyber defenders and investigators one step closer to successful cyber attribution: deception technology. The goal is to detect abnormal activities taking place in the computer system by planting so called fake entities into the system. These fake entities appear to be interesting and valuable for the attacker. The deceptive defense mechanism then waits for the malicious adversary to interact with these fake entities. A fake entity can be anything from a fabricated file to a fake user account in a system. This paper takes a look at how different fake entities can be used for cyber attribution. We conclude that deception technology and fake entities have lots of potential for further development when trying to solve the challenge of cyber attribution.
Article
Full-text available
File-based deception technologies can be used as an additional security barrier when adversaries have successfully gained access to a host evading intrusion detection systems. Adversaries are detected if they access fake files. Though previous works have mainly focused on using user data files as decoys, this concept can be applied to system files. If so, it is expected to be effective in detecting malicious users because it is very difficult to commit an attack without accessing a single system file. However, it may suffer from excessive false alarms by legitimate system services such as file indexing and searching. Legitimate users may also access fake files by mistake. This paper addresses this issue by introducing a hidden interface. Legitimate users and applications access files through the hidden interface which does not show fake files. The hidden interface can also be utilized to hide sensitive files by hiding them from the regular interface. By experiments, we demonstrate the proposed technique incurs negligible performance overhead, and it is an effective countermeasure to various attack scenarios and practical in that it does not generate false alarms for legitimate applications and users.
Chapter
Cybercrime has become a big money business with sensitive data being a hot commodity on the dark web. In this paper, we introduce and evaluate a filesystem (DcyFS) capable of curtailing data theft and ensuring file integrity protection by providing subject-specific views of the filesystem. The deceptive filesystem transparently creates multiple levels of stacking to protect the base filesystem and monitor file accesses, hide and redact sensitive files with baits, and inject decoys onto fake system views purveyed to untrusted subjects, all while maintaining a pristine state to legitimate processes. A novel security domain model groups applications into filesystem views and eliminates the need for filesystem merging. Our prototype implementation leverages a kernel hot-patch to seamlessly integrate the new filesystem module into live and existing environments. We demonstrate the utility of our approach through extensive performance benchmarks and use cases on real malware samples, including ransomware, rootkits, binary modifiers, backdoors, and library injectors. Our results show that DcyFS adds no significant performance overhead to the filesystem, preserves the filesystem data, and offers a potent new tool to characterize the impact of malicious activities and expedite forensic investigations.
Chapter
Deception is a promising method for strengthening software security. It differs from many traditional security approaches as it does not directly prevent the attacker’s actions but instead aims to learn about the attacker’s behavior. In this paper, we discuss the idea of deceiving attackers with fake services and fabricated content in order to find out more about malware’s functionality and to hamper cyber intelligence. The effects of false data on the malware’s behavior can be studied while at the same time complicating cyber intelligence by feeding fallacious content to the adversary. We also discuss the properties required from a tool generating fabricated entities. We then introduce a design for a honeypot proxy that generates fallacious content for fake services in order to deceive attackers, and test our implementation’s accuracy and performance. We conclude that although challenging in many ways, deceiving adversaries with fake services is a promising and feasible approach in order to protect computer systems and analyze malware. © Springer International Publishing AG, part of Springer Nature 2018.
Chapter
Text and images are classic ways to do deception because they seem more permanent and believable than direct human interaction. We will call deception in constructed text and media “fakes”. Fake documents and images have played important roles in history. “Operation Mincemeat” of World War II planted fake documents, including things like letters from home and theater tickets as well as some official letters, all suggesting that the British were planning to invade southern Europe rather than Normandy, and the deceptions were effective (Latimer 2003). Fake documents are also important for counterintelligence; if one makes the false information distinctive, it is easy to tell if an adversary has used it.
Conference Paper
Digital data theft is difficult to detect and typically it also takes a long time to discover that data has been stolen. This paper introduces a data-driven approach based on Markov chains to create believable decoy project folders which can assist in detecting potentially ongoing attacks. This can be done by deploying these intrinsically valueless folders between real project folders and by monitoring interactions with them. We present our approach and results from a user study demonstrating the believability of the generated decoy folders.
Conference Paper
The number and complexity of cyber-attacks has been increasing steadily in the last years. Adversaries are targeting the communications and information systems (CIS) of government, military and industrial organizations, as well as critical infrastructures, and are willing to spend large amounts of money, time and expertise on reaching their goals. In addition, recent sophisticated insider attacks resulted in the exfiltration of highly classified information to the public. Traditional security solutions have failed repeatedly to mitigate such threats. In order to defend against such sophisticated adversaries we need to redesign our defences, developing technologies focused more on detection than prevention. In this paper, we address the attack potential of advanced persistent threats (APT) and malicious insiders, highlighting the common characteristics of these two groups. In addition, we propose the use of multiple deception techniques, which can be used to protect both the external and internal resources of an organization and significantly increase the possibility of early detection of sophisticated attackers.
Article
Full-text available
Information stored in companies' databases is often the most valuable one and as a result database security is of great importance. It is a very complex task, however, to ensure database security without limiting capabilities and productivity of legal users. Honeytokens are one of the new methods to increase information security. The method uses fake information resources to attract illegal users and identify them. In the paper implementation of honeytoken module for Oracle 9iR2 database management system is described. Three original modules were programmed to combine Oracle Fine-Graned auditing features, internal triggers and functions reporting on illegal access to fake resource.
Article
Full-text available
We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party.
Conference Paper
Full-text available
The insider threat remains one of the most vexing problems in computer security. A number of approaches have been proposed to detect nefarious insider actions including user modeling and profiling techniques, policy and access enforcement techniques, and misuse detection. In this work we propose trap-based defense mechanisms and a deployment platform for addressing the problem of insiders attempting to exfiltrate and use sensitive information. The goal is to confuse and confound an adversary requiring more effort to identify real information from bogus information and provide a means of detecting when an attempt to exploit sensitive information has occurred. “Decoy Documents” are automatically generated and stored on a file system by the D3 System with the aim of enticing a malicious user. We introduce and formalize a number of properties of decoys as a guide to design trap-based defenses to increase the likelihood of detecting an insider attack. The decoy documents contain several different types of bogus credentials that when used, trigger an alert. We also embed “stealthy beacons” inside the documents that cause a signal to be emitted to a server indicating when and where the particular decoy was opened. We evaluate decoy documents on honeypots penetrated by attackers demonstrating the feasibility of the method.
Conference Paper
Full-text available
This paper introduces an intrusion-detection device named honeyfiles. Honeyfiles are bait files intended for hackers to access. The files reside on a file server, and the server sends an alarm when a honey file is accessed. For example, a honeyfile named "passwords.txt" would be enticing to most hackers. The file server's end-users create honeyfiles, and the end-users receive the honeyfile's alarms. Honeyfiles can increase a network's internal security without adversely affecting normal operations. The honeyfile system was tested by deploying it on a honeynet, where hackers' use of honeyfiles was observed. The use of honeynets to test a computer security device is also discussed. This form of testing is a useful way of finding the faulty and overlooked assumptions made by the device's developers.
Conference Paper
Identity theft continues to be an ever-present problem. Identity theft and other related crimes are becoming an unparalleled phenomenon that nearly everyone will have to deal with in some way in the coming years. As the number of people affected by identity theft and data spills has grown into the tens of millions, more needs to be done in the way of providing mechanisms to secure personally identifying data, including data consisting of social security numbers, names, addresses, and phone numbers. One such mechanism that would enhance security is the use of realistic synthetic decoy records. These decoys would be inserted into the actual data in such a way that only the person or program that inserted the decoys can tell what is real and what is synthetic. Also, these decoys could also be created in such a way that they are probabilistically unique, making a kind of watermark for that particular dataset. This paper examines a method by which identity theft can be combated by using decoys as described above. While decoys do not hide or encrypt the actual personally identifying data, it will be shown that they can be used to uniquely pin point particular data sources, making it possible to isolate what source was used in the theft. It will also be shown how realistic decoys make it much more difficult to use the actual data, because of the inability to distinguish between what is real and what is fake. Finally, we will show how we have implemented a system that is capable of producing these very realistic, personally identifying, decoy records.
Article
This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation. Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented. To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1). We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.
Article
A model of a real-time intrusion-detection expert system capable of detecting break-ins, penetrations, and other forms of computer abuse is described. The model is based on the hypothesis that security violations can be detected by monitoring a system's audit records for abnormal patterns of system usage. The model includes profiles for representing the behavior of subjects with respect to objects in terms of metrics and statistical models, and rules for acquiring knowledge about this behavior from audit records and for detecting anomalous behavior. The model is independent of any particular system, application environment, system vulnerability, or type of intrusion, thereby providing a framework for a general-purpose intrusion-detection expert system. Index Terms-Abnormal behavior, auditing, intrusions, monitoring, profiles, security, statistical measures. I. INTRODUCTION
Article
Over the last few years, network based intrusions have increased rapidly, due to the increase and popularity of various attack tools easily available today. Due to this increase in intrusions, the concept of network Honeypots are being developed, which can be used to trap and decode the attack methods of the malicious attackers. This paper will review the current state of honeypot technology as well as will describe the framework of how to improve the effectiveness of Deceptive Honeynets through the use of deception.
Conference Paper
This paper discusses the meaning and challenges of strategic cyber defense (SCD), and some possible strategies for dealing with these challenges. The purpose is to describe and codify the DARPA Information Assurance (IA) program's conceptual framework for defensive techniques in the cyber realm. The focus in the IA program is on cyber defense techniques for threats which are postulated to have the greatest potential impact at the strategic level of conflict. The specific resource we seek to defend is the national information infrastructure (NII), to include the subset of resources dedicated to execution of the defense mission, and commonly referred to as the defense information infrastructure (DII). The paper summarizes a definition of strategic-level conflict, what we mean by cyber defense in the IA program, and the reason for our specific focus on strategic-level cyber defense. Our current efforts to identify and develop SCD warfighting strategies are described, together with a discussion of the technical capabilities which would be needed to implement the strategies
Conference Paper
This study concentrates on the security-related issues in a single broadcast LAN (local area network) such as Ethernet. The authors formalize various possible network attacks. Their basic strategy is to develop profiles of usage of network resources and then compare current usage patterns with the historical profile to determine possible security violations. Thus, the work is similar to the host-based intrusion-detection systems. Different from such systems, however, is the use of a hierarchical model to refine the focus of the intrusion-detection mechanism. The authors also report on the development of an experimental LAN monitor currently under implementation. Several network attacks have been simulated, and results on how the monitor has been able to detect these attacks are analyzed. Initial results demonstrate that many network attacks are detectable with the authors' monitor, although it can be defeated
Article
Intrusion detection is a new, retrofit approach for providing a sense of security in existing computers and data networks, while allowing them to operate in their current "open" mode. The goal of intrusion detection is to identify unauthorized use, misuse, and abuse of computer systems by both system insiders and external penetrators. The intrusion detection problem is becoming a challenging task due to the proliferation of heterogeneous computer networks since the increased connectivity of computer systems gives greater access to outsiders and makes it easier for intruders to avoid identification. Intrusion detection systems (IDSs) are based on the beliefs that an intruder's behavior will be noticeably different from that of a legitimate user and that many unauthorized actions are detectable. Typically, IDSs employ statistical anomaly and rulebased misuse models in order to detect intrusions. A number of prototype IDSs have been developed at several institutions, and some of them have also been deployed on an experimental basis in operational systems. In the present paper, several host-based and network-based IDSs are surveyed, and the characteristics of the corresponding systems are identified. The host-based systems employ the host operating system's audit trails as the main source of input to detect intrusive activity, while most of the network-based IDSs build their detection mechanism on monitored network traffic, and some employ host audit trails as well. An outline of a statistical anomaly detection algorithm employed in a typical IDS is also included.< >
A Framework for Deception
  • F Cohen
  • D Lambert
  • C Preston
  • N Berry
  • C Stewart
  • E Thomas
Cohen, F., Lambert, D., Preston, C., Berry, N., Stewart, C., Thomas, E.: A Framework for Deception. Computers and Security, (2001).