
Simson L. Garfinkel- PhD
- Consultant at National Institute of Standards and Technology
Simson L. Garfinkel
- PhD
- Consultant at National Institute of Standards and Technology
About
171
Publications
95,965
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,199
Citations
Current institution
Publications
Publications (171)
Chester Gordon Bell, a pioneer of modern computing, died May 17, 2024. His work helped shape the field as we know it.
The Census TopDown Algorithm (TDA) is a disclosure avoidance system using differential privacy for privacy-loss accounting. The algorithm ingests the final, edited version of the 2020 Census data and the final tabulation geographic definitions. The algorithm then creates noisy versions of key queries on the data, referred to as measurements, using...
It is often said that quantum technologies are poised to change the world as we know it, but cutting through the hype, what will quantum technologies actually mean for countries and their citizens? In Law and Policy for the Quantum Age, Chris Jay Hoofnagle and Simson L. Garfinkel explain the genesis of quantum information science (QIS) and the resu...
Privacy-protected microdata are often the desired output of a differentially private algorithm since microdata is familiar and convenient for downstream users. However, there is a statistical price for this kind of convenience. We show that an uncertainty principle governs the trade-off between accuracy for a population of interest ("sum query") vs...
The U.S. Census Bureau is using differential privacy (DP) to protect confidential respondent data collected for the 2020 Decennial Census of Population & Housing. The Census Bureau's DP system is implemented in the Disclosure Avoidance System (DAS) and requires a source of random numbers. We estimate that the 2020 Census will require roughly 90TB o...
IN 2020, THE U.S. Census Bureau will conduct the Constitutionally mandated decennial Census of Population and Housing. Because a census involves collecting large amounts of private data under the promise of confidentiality, traditionally statistics are published only at high levels of aggregation. Published statistical tables are vulnerable to data...
When differential privacy was created more than a decade ago, the motivating example was statistics published by an official statistics agency. In attempting to transition differential privacy from the academy to practice, the U.S. Census Bureau has encountered many challenges unanticipated by differential privacy's creators. These challenges inclu...
With the dramatic improvement in both computer speeds and the efficiency of SAT and other NP-hard solvers in the last decade, DRAs on statistical databases are no longer just a theoretical danger. The vast quantity of data products published by statistical agencies each year may give a determined attacker more than enough information to reconstruct...
Social science research is transitioning from working with “designated data,” collected through experiments and surveys, to working with “organic data,” including administrative data not collected for research purposes, and other data such as those collected from online social networks and large-scale sensor networks. The shift to organic data requ...
Memory analysis is slowly moving up the software stack. Early analysis efforts focused on core OS structures and services. As this field evolves, more information becomes accessible because analysis tools can build on foundational frameworks like Volatility and Rekall. This paper demonstrates and establishes memory analysis techniques for managed r...
More than 5.4 million Personal Identity Verification (PIV) and Common Access Cards (CAC) have been deployed to US government employees and contractors. These cards allow physical access to federal facilities, but their use to authenticate logical access to government information systems is uneven, with deployment rates across agencies ranging from...
Due to lack of mature techniques in privacy-preserving information retrieval (IR), concerns about information privacy and security have become serious obstacles that prevent valuable user data to be used in IR research such as studies on query logs, social media, and medical record retrieval. In SIGIR 2014 and SIGIR 2015, we have run the privacy-pr...
Abstract Hash-based carving is a technique for detecting the presence of specific "target files" on digital media by evaluating the hashes of individual data blocks, rather than the hashes of entire files. Unlike whole-file hashing, hash-based carving can identify files that are fragmented, files that are incomplete, or files that have been partial...
There has been roughly 15 years of research into approaches for aligning research in Human Computer Interaction with computer Security, more colloquially known as ``usable security.'' Although usability and security were once thought to be inherently antagonistic, today there is wide consensus that systems that are not usable will inevitably suffer...
Forensically significant digital trace evidence that is frequently present in sectors of digital media not associated with allocated or deleted files. Modern digital forensic tools generally do not decompress such data unless a specific file with a recognized file type is first identified, potentially resulting in missed evidence. Email addresses a...
On 27 December 2013, the US Court of Appeals for the Ninth Circuit issued an opinion that intercepting data from unencrypted wireless local area networks--Wi-Fi sniffing--can violate the US Wiretap Act. The case centers on a Wi-Fi sniffer that was present in Google Street View vehicles that roamed the US between 2008 and 2010 and that were permanen...
Complex document formats such as PDF and Microsoft's Compound File Binary Format can contain information that is hidden but recoverable, as a result of text highlighting, cropping, or the embedding of high-resolution JPEG images. Private information can be released inadvertently if these files are distributed in electronic form. Simple experiments...
In this chapter we discuss the major arc of UPS research over the past four decades.
For more than two decades there has been a growing realization that building secure computer systems requires attention not just to security mechanisms and underlying implementations, but also, to user interfaces and psychological issues. Although one would expect evolutionary pressure from market forces to slowly improve the usability of security...
Much of the UPS research of the past decade mirrors that of applied security work in general—tactical responses to specific problems of the day, rather than long-range strategic research. Tactical research is clearly important, as it addresses current needs and often results in immediate gains. However, literature reviews such as this are better su...
The academic quest to align usability and security started in the 1990s with a few key observations:
Security is a Secondary Task: Users are not focused on securing their systems, they want to use their systems for accomplishing other goals.
Humans are Frequently a Weak Link: Humans are inherently part of the system that provides for information se...
This chapter explores challenges facing UPS researchers. We use the word “challenges” to describe the research problems that remain, and are likely to remain, problems. We also focus on challenges specific to UPS, rather than challenges that UPS shares with security or usability in general.
This research uses machine learning and outlier analysis to detect potentially hostile insiders through the automated analysis of stored data on cell phones, laptops, and desktop computers belonging to members of an organization. Whereas other systems look for specific signatures associated with hostile insider activity, our system is based on the...
Finding and preserving the evidence of an electronic trail in a crime requires careful methods as well as technical skill. This has led to the development of the field of digital forensics, which examines how equipment has to be handled to ensure that it hasn't been altered once it has been taken for evidence, how to copy material reliably, and how...
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths is a difficult problem because of the minimal context for translation and the frequent mixing of multiple languages within a path. Thi...
A cyberweapon can be as dangerous as any weapon. Fortunately, recent technology now provides some tools for cyberweapons control. Digital forensics can be done on computers seized during or after hostilities. Cyberweapons differ significantly from other software, especially during development, and recent advances in summarizing the contents of stor...
Bulk data analysis eschews file extraction and analysis, common in forensic practice today, and instead processes data in “bulk,” recognizing and extracting salient details (“features”) of use in the typical digital forensics investigation. This article presents the requirements, design and implementation of the bulk_extractor, a high-performance c...
Using an alternative approach to traditional
file hashing, digital forensic investigators
can hash individually sampled subject
drives on sector boundaries and then
check these hashes against a prebuilt da-
tabase, making it possible to process raw
media without reference to the underlying
file system.
Writing digital forensics (DF) tools is difficult because of the diversity of data types that needs to be processed, the need for high performance, the skill set of most users, and the requirement that the software run without crashing. Developing this software is dramatically easier when one possesses a few hundred disks of other people's data for...
Increased attention to cybersecurity has not resulted in improved cybersecurity.
Digital Forensics XML (DFXML) is an XML language that enables the exchange of structured forensic information. DFXML can represent the provenance of data subject to forensic investigation, document the presence and location of file systems, files, Microsoft Windows Registry entries, JPEG EXIFs, and other technical information of interest to the for...
When computer systems are found during law enforcement, peacekeeping, counter-insurgency or similar operations, a key problem for forensic investigators is to identify useful subject-specific information in a sea of routine and uninteresting data. For instance, when a computer is obtained during a search of a criminal organization, investigators ar...
We describe a tool Dirim for automatically finding files on a drive that are anomalous or suspicious, and thus worthy of focus during digital-forensic investigation, based on solely their directory information. Anomalies are found both from comparing overall drive statistics and from comparing clusters of related files using a novel approach of "su...
In nuclear physics, the phrase decay rate is used to denote the rate that atoms and other particles spontaneously decompose. Uranium-235 famously decays into a variety of daughter isotopes including Thorium and Neptunium, which themselves decay to others. Decay rates are widely observed and wildly different depending on many factors, both internal...
Modern systems aren't designed to support some ongoing operations after their security has been compromised. Using Sterbenz's ResiliNets (resilient networks) model for describing the tasks of managing a system that might be attacked, the authors discuss five strategies for operating in a degraded security environment: ignorance is bliss (no recover...
a b s t r a c t Using validated carving techniques, we show that popular operating systems (e.g. Windows, Linux, and OSX) frequently have residual IP packets, Ethernet frames, and associated data structures present in system memory from long-terminated network traffic. Such information is useful for many forensic purposes including establishment of...
Disk images (bitstreams extracted from physical media) can play an essential role in the acquisition and management of digital collections by serving as containers that support data integrity and chain of custody, while ensuring continued access to the underlying bits without depending on physical carriers. Widely used today by practitioners of dig...
This paper presents a novel solution to the problem of determining the ownership of carved information found on disk drives and other storage media that have been used by more than one person. When a computer is subject to forensic examination, information may be found that cannot be readily ascribed to a specific user. Such information is typicall...
We present work on the design, implementation, distribution, and use of realistic forensic datasets to support digital forensics and security education. We describe in particular the "M57-Patents" scenario, a multi-modal corpus consisting of hard drive images, RAM images, network captures, and images from other devices typically found in forensics...
Cyberweapons are difficult weapons to control and police. Nonetheless, technology is becoming available that can help. We propose here the underlying technology necessary to support cyberarms agreements. Cyberweapons usage can be distinguished from other malicious Internet traffic in that they are aimed precisely at targets which we can often predi...
Failures in systems closely correlate to shortcomings in the system's requirements. Some historic data suggests that requirements are responsible for nearly half of all system development failures. This is especially true for critical systems that are ...
Today’s Golden Age of computer forensics is quickly coming to an end. Without a clear strategy for enabling research efforts that build upon one another, forensic research will fall behind the market, tools will become increasingly obsolete, and law enforcement, military and other users of computer forensics products will be unable to rely on the r...
This paper explores the use of purpose-built functions and cryptographic hashes of small data blocks for identifying data in sectors, file fragments, and entire files. It introduces and defines the concept of a “distinct” disk sector—a sector that is unlikely to exist elsewhere except as a copy of the original. Techniques are presented for improved...
A proposal for improving the review procedures for research projects that involve human subjects and their associated identifiable private information.
Abstract—Global analysis is a useful supplement to local forensic analysis of the details of files in a drive image. This paper reports on experiments with global methods to find time patterns associated with disks and files. The Real Disk Corpus of over 1000 drive images from eight countries was used as a corpus. The data was clustered into 63 sub...
Today's Golden Age of computer forensics is quickly coming to an end. Without a clear strategy for enabling research efforts that build upon one another, forensic research will fall behind the market, tools will become increasingly obsolete, and law enforcement, military and other users of computer forensics products will be unable to rely on the r...
Forensic analysis requires the acquisition and management of many different types of evidence, including individual disk drives, RAID sets, network packets, memory images, and extracted files. Often the same evidence is reviewed by several different tools or examiners in different locations. We propose a backwards-compatible redesign of the Advance...
Progress in computer forensics research has been limited by the lack of a standardized data sets—corpora—that are available for research purposes. We explain why corpora are needed to further forensic research, present a taxonomy for describing corpora, and announce the availability of several forensic data sets.
The Windows Vista personal firewall provides its diverse users with a basic interface that hides many operational details. However, concealing the impact of network context on the security state of the firewall may result in users developing an incorrect ...
We have developed a program called fiwalk which produces detailed XML describing all of the partitions and files on a hard drive or diskimage, as well as any extractable metadata from the document files themselves. We show how it is relatively simple to create automated disk forensic applications using a Python module we have written that reads fiw...
Increasingly advances in file carving, memory analysis and network forensics requires the ability to identify the underlying type of a file given only a file fragment. Work to date on this problem has relied on identification of specific byte sequences in file headers and footers, and the use of statistical analysis and machine learning algorithms...
The forensic analysis of two rival XML-based office document file formats such as Office Open XML (OOX) and OpenDocument Format (ODF) is presented. The unique identifiers within the ODF and OOX files include 32-bit numbers stored in hexadecimal and 128-bit number unique of the particular generation of a document. The OOX documents include a store i...
This article presents improvements in the Advanced Forensics Format Library version 3 that provide for digital signatures and other cryptographic protections for digital evidence, allowing an investigator to establish a reliable chain-of-custody for electronic evidence from the crime scene to the court room. No other system for handling and storing...
When the CellBE processor was introduced, the Advanced Encryption Standard (AES) was one of the benchmarks; IBM published throughput speeds for different modes but gave no details on the precise implementation. Our team has developed AES independently. For ECB encryption our version is slightly faster than that of IBM; for CBC encryption our versio...
With the move to "cloud" computing, archivists face the increasingly difficult task of finding and preserving the works of an originator so that they may be readily used by future historians. This paper explores the range of infor-mation that an originator may have left on computers "out there on the Internet," including works that are publicly ide...
OMB No. 0704–0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estim...
Much effort has been expended in recent years to cre- ate large sets of hash codes from known files. Distribut- ing these sets has become more difficult as these sets grow larger. Meanwhile the value of these sets for eliminating the need to analyze "known goods" has decreased as hard drives have dramatically increased in storage capacity. This pap...
The process of collecting information from multiple resources and merging it, known as data fusion, is supposed to create an information resource that is more powerful, more flexible and more accurate. Data quality is one of the issues related with data fusion, as much of the information in databases was originally collected for statistical purpose...
Amazon.com has introduced the Simple Storage Service (S3), a commodity-priced storage utility. S3 aims to provide storage as a low-cost, highly available service, with a simple 'pay-as-you-go' charging model. This article makes three contributions. First, we evaluate S3's ability to provide storage support to large-scale science projects from a cos...
Simson Garfinkel reviews Security Data Visualization: Graphical Techniques for Network Analysis by Greg Conti.
Having decided to focus attention on the "weak link" of human fallibility, a growing number of security re- searchers are discovering the US Government's regula- tions that govern human subject research. This paper dis- cusses those regulations, their application to research on security and usability, and presents strategies for negoti- ating the I...
A computer used by Al Qaeda ends up in the hands of a Wall Street Journal reporter. A laptop from Iran is discovered that contains details of that country’s nuclear weapons program. Photographs and videos are downloaded from terrorist Web sites. As evidenced by these and countless other cases, digital documents and storage devices hold the key to m...
“File carving” reconstructs files based on their content, rather than using metadata that points to the content. Carving is widely used for forensics and data recovery, but no file carvers can automatically reassemble fragmented files. We survey files from more than 300 hard drives acquired on the secondary market and show that the ability to reass...
We present an integrated security model for a low-cost laptop that will be widely deployed throughout the developing world. Implemented on top of Linux operating system, the model is designed to restrict the laptop's software without restricting the laptop's user.
Users are increasingly demanding two contradictory system properties - the ability to absolutely, positively erase information so that it cannot be recovered, and the ability to recover information that was inadvertently or intentionally altered or deleted. Storage system designers now need to resolve the tension between complete delete and time ma...
Computer Forensic Tools (CFTs) allow investigators to recover deleted files, reconstruct an intruder's activities, and gain intelligence about a computer's user. Anti-Forensics (AF) tools and techniques frustrate CFTs by erasing or altering information; creating "chaff" that wastes time and hides information; implicating innocent parties by plantin...
Research in the field of computer forensics is hobbled by the lack of realistic data. Academics are not developing automated techniques and tools because they lack the raw data necessary to develop and validate algorithms. Investigators that have access to real data operate under legal and practical restraints that prevent the data from being used...
The guest editors discuss data surveillance. Proponents hope that data surveillance technology will be able to anticipate and prevent terrorist attacks, detect disease outbreaks, and allow for detailed social science research--all without the corresponding risks to personal privacy because machines, not people, perform the surveillance.
This paper introduces Forensic Feature Extraction (FFE) and Cross-Drive Analysis (CDA), two new approaches for analyzing large data sets of disk images and other forensic data. FFE uses a variety of lexigraphic techniques for extracting information from bulk data; CDA uses statistical techniques for correlating this information within a single disk...
Many of today’s privacy-preserving tools create a big file that fills up a hard drive or USB storage device in an effort to overwrite all of the “deleted files” that the media contain. But while this technique is widespread, it is largely unvalidated.
We evaluate the effectiveness of the “big file technique” using sector-by-sector disk imaging on f...
It is widely believed that security and usability are two antagonistic goals in system design. This thesis argues that there are many instances in which security and usability can be synergistically improved by revising the way that specific functionality is implemented in many of today's operating systems and applications. Specific design principl...
Automatic provenance collection describes systems that ob- serve processes and data transformations inferring, collecting, and main- taining provenance about them. Automatic collection is a powerful tool for analysis of objects and processes, providing a level of transparency and pervasiveness not found in more conventional provenance systems. Unfo...
Security toolbars in a web browser show security-related information about a website to help users detect phishing attacks. Because the toolbars are designed for humans to use, they should be evaluated for usability - that is, whether these toolbars really prevent users from being tricked into providing personal information. We conducted two user s...