Simson L. Garfinkel

Simson L. Garfinkel
  • PhD
  • Consultant at National Institute of Standards and Technology

About

171
Publications
95,965
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,199
Citations
Current institution
National Institute of Standards and Technology
Current position
  • Consultant

Publications

Publications (171)
Article
Chester Gordon Bell, a pioneer of modern computing, died May 17, 2024. His work helped shape the field as we know it.
Preprint
Full-text available
The Census TopDown Algorithm (TDA) is a disclosure avoidance system using differential privacy for privacy-loss accounting. The algorithm ingests the final, edited version of the 2020 Census data and the final tabulation geographic definitions. The algorithm then creates noisy versions of key queries on the data, referred to as measurements, using...
Book
It is often said that quantum technologies are poised to change the world as we know it, but cutting through the hype, what will quantum technologies actually mean for countries and their citizens? In Law and Policy for the Quantum Age, Chris Jay Hoofnagle and Simson L. Garfinkel explain the genesis of quantum information science (QIS) and the resu...
Preprint
Full-text available
Privacy-protected microdata are often the desired output of a differentially private algorithm since microdata is familiar and convenient for downstream users. However, there is a statistical price for this kind of convenience. We show that an uncertainty principle governs the trade-off between accuracy for a population of interest ("sum query") vs...
Preprint
Full-text available
The U.S. Census Bureau is using differential privacy (DP) to protect confidential respondent data collected for the 2020 Decennial Census of Population & Housing. The Census Bureau's DP system is implemented in the Disclosure Avoidance System (DAS) and requires a source of random numbers. We estimate that the 2020 Census will require roughly 90TB o...
Article
IN 2020, THE U.S. Census Bureau will conduct the Constitutionally mandated decennial Census of Population and Housing. Because a census involves collecting large amounts of private data under the promise of confidentiality, traditionally statistics are published only at high levels of aggregation. Published statistical tables are vulnerable to data...
Preprint
Full-text available
When differential privacy was created more than a decade ago, the motivating example was statistics published by an official statistics agency. In attempting to transition differential privacy from the academy to practice, the U.S. Census Bureau has encountered many challenges unanticipated by differential privacy's creators. These challenges inclu...
Article
With the dramatic improvement in both computer speeds and the efficiency of SAT and other NP-hard solvers in the last decade, DRAs on statistical databases are no longer just a theoretical danger. The vast quantity of data products published by statistical agencies each year may give a determined attacker more than enough information to reconstruct...
Article
Social science research is transitioning from working with “designated data,” collected through experiments and surveys, to working with “organic data,” including administrative data not collected for research purposes, and other data such as those collected from online social networks and large-scale sensor networks. The shift to organic data requ...
Article
Full-text available
Memory analysis is slowly moving up the software stack. Early analysis efforts focused on core OS structures and services. As this field evolves, more information becomes accessible because analysis tools can build on foundational frameworks like Volatility and Rekall. This paper demonstrates and establishes memory analysis techniques for managed r...
Article
More than 5.4 million Personal Identity Verification (PIV) and Common Access Cards (CAC) have been deployed to US government employees and contractors. These cards allow physical access to federal facilities, but their use to authenticate logical access to government information systems is uneven, with deployment rates across agencies ranging from...
Conference Paper
Due to lack of mature techniques in privacy-preserving information retrieval (IR), concerns about information privacy and security have become serious obstacles that prevent valuable user data to be used in IR research such as studies on query logs, social media, and medical record retrieval. In SIGIR 2014 and SIGIR 2015, we have run the privacy-pr...
Article
Full-text available
Abstract Hash-based carving is a technique for detecting the presence of specific "target files" on digital media by evaluating the hashes of individual data blocks, rather than the hashes of entire files. Unlike whole-file hashing, hash-based carving can identify files that are fragmented, files that are incomplete, or files that have been partial...
Article
There has been roughly 15 years of research into approaches for aligning research in Human Computer Interaction with computer Security, more colloquially known as ``usable security.'' Although usability and security were once thought to be inherently antagonistic, today there is wide consensus that systems that are not usable will inevitably suffer...
Article
Full-text available
Forensically significant digital trace evidence that is frequently present in sectors of digital media not associated with allocated or deleted files. Modern digital forensic tools generally do not decompress such data unless a specific file with a recognized file type is first identified, potentially resulting in missed evidence. Email addresses a...
Article
On 27 December 2013, the US Court of Appeals for the Ninth Circuit issued an opinion that intercepting data from unencrypted wireless local area networks--Wi-Fi sniffing--can violate the US Wiretap Act. The case centers on a Wi-Fi sniffer that was present in Google Street View vehicles that roamed the US between 2008 and 2010 and that were permanen...
Article
Complex document formats such as PDF and Microsoft's Compound File Binary Format can contain information that is hidden but recoverable, as a result of text highlighting, cropping, or the embedding of high-resolution JPEG images. Private information can be released inadvertently if these files are distributed in electronic form. Simple experiments...
Chapter
In this chapter we discuss the major arc of UPS research over the past four decades.
Chapter
For more than two decades there has been a growing realization that building secure computer systems requires attention not just to security mechanisms and underlying implementations, but also, to user interfaces and psychological issues. Although one would expect evolutionary pressure from market forces to slowly improve the usability of security...
Chapter
Much of the UPS research of the past decade mirrors that of applied security work in general—tactical responses to specific problems of the day, rather than long-range strategic research. Tactical research is clearly important, as it addresses current needs and often results in immediate gains. However, literature reviews such as this are better su...
Chapter
The academic quest to align usability and security started in the 1990s with a few key observations: Security is a Secondary Task: Users are not focused on securing their systems, they want to use their systems for accomplishing other goals. Humans are Frequently a Weak Link: Humans are inherently part of the system that provides for information se...
Chapter
This chapter explores challenges facing UPS researchers. We use the word “challenges” to describe the research problems that remain, and are likely to remain, problems. We also focus on challenges specific to UPS, rather than challenges that UPS shares with security or usability in general.
Conference Paper
This research uses machine learning and outlier analysis to detect potentially hostile insiders through the automated analysis of stored data on cell phones, laptops, and desktop computers belonging to members of an organization. Whereas other systems look for specific signatures associated with hostile insider activity, our system is based on the...
Article
Finding and preserving the evidence of an electronic trail in a crime requires careful methods as well as technical skill. This has led to the development of the field of digital forensics, which examines how equipment has to be handled to ensure that it hasn't been altered once it has been taken for evidence, how to copy material reliably, and how...
Article
Full-text available
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths is a difficult problem because of the minimal context for translation and the frequent mixing of multiple languages within a path. Thi...
Article
Full-text available
A cyberweapon can be as dangerous as any weapon. Fortunately, recent technology now provides some tools for cyberweapons control. Digital forensics can be done on computers seized during or after hostilities. Cyberweapons differ significantly from other software, especially during development, and recent advances in summarizing the contents of stor...
Article
Full-text available
Bulk data analysis eschews file extraction and analysis, common in forensic practice today, and instead processes data in “bulk,” recognizing and extracting salient details (“features”) of use in the typical digital forensics investigation. This article presents the requirements, design and implementation of the bulk_extractor, a high-performance c...
Article
Full-text available
Using an alternative approach to traditional file hashing, digital forensic investigators can hash individually sampled subject drives on sector boundaries and then check these hashes against a prebuilt da- tabase, making it possible to process raw media without reference to the underlying file system.
Article
Full-text available
Writing digital forensics (DF) tools is difficult because of the diversity of data types that needs to be processed, the need for high performance, the skill set of most users, and the requirement that the software run without crashing. Developing this software is dramatically easier when one possesses a few hundred disks of other people's data for...
Article
Increased attention to cybersecurity has not resulted in improved cybersecurity.
Article
Digital Forensics XML (DFXML) is an XML language that enables the exchange of structured forensic information. DFXML can represent the provenance of data subject to forensic investigation, document the presence and location of file systems, files, Microsoft Windows Registry entries, JPEG EXIFs, and other technical information of interest to the for...
Article
Full-text available
When computer systems are found during law enforcement, peacekeeping, counter-insurgency or similar operations, a key problem for forensic investigators is to identify useful subject-specific information in a sea of routine and uninteresting data. For instance, when a computer is obtained during a search of a criminal organization, investigators ar...
Conference Paper
Full-text available
We describe a tool Dirim for automatically finding files on a drive that are anomalous or suspicious, and thus worthy of focus during digital-forensic investigation, based on solely their directory information. Anomalies are found both from comparing overall drive statistics and from comparing clusters of related files using a novel approach of "su...
Article
Full-text available
In nuclear physics, the phrase decay rate is used to denote the rate that atoms and other particles spontaneously decompose. Uranium-235 famously decays into a variety of daughter isotopes including Thorium and Neptunium, which themselves decay to others. Decay rates are widely observed and wildly different depending on many factors, both internal...
Article
Modern systems aren't designed to support some ongoing operations after their security has been compromised. Using Sterbenz's ResiliNets (resilient networks) model for describing the tasks of managing a system that might be attacked, the authors discuss five strategies for operating in a degraded security environment: ignorance is bliss (no recover...
Article
Full-text available
a b s t r a c t Using validated carving techniques, we show that popular operating systems (e.g. Windows, Linux, and OSX) frequently have residual IP packets, Ethernet frames, and associated data structures present in system memory from long-terminated network traffic. Such information is useful for many forensic purposes including establishment of...
Conference Paper
Full-text available
Disk images (bitstreams extracted from physical media) can play an essential role in the acquisition and management of digital collections by serving as containers that support data integrity and chain of custody, while ensuring continued access to the underlying bits without depending on physical carriers. Widely used today by practitioners of dig...
Article
Full-text available
This paper presents a novel solution to the problem of determining the ownership of carved information found on disk drives and other storage media that have been used by more than one person. When a computer is subject to forensic examination, information may be found that cannot be readily ascribed to a specific user. Such information is typicall...
Article
Full-text available
We present work on the design, implementation, distribution, and use of realistic forensic datasets to support digital forensics and security education. We describe in particular the "M57-Patents" scenario, a multi-modal corpus consisting of hard drive images, RAM images, network captures, and images from other devices typically found in forensics...
Article
Cyberweapons are difficult weapons to control and police. Nonetheless, technology is becoming available that can help. We propose here the underlying technology necessary to support cyberarms agreements. Cyberweapons usage can be distinguished from other malicious Internet traffic in that they are aimed precisely at targets which we can often predi...
Conference Paper
Failures in systems closely correlate to shortcomings in the system's requirements. Some historic data suggests that requirements are responsible for nearly half of all system development failures. This is especially true for critical systems that are ...
Article
Full-text available
Today’s Golden Age of computer forensics is quickly coming to an end. Without a clear strategy for enabling research efforts that build upon one another, forensic research will fall behind the market, tools will become increasingly obsolete, and law enforcement, military and other users of computer forensics products will be unable to rely on the r...
Article
Full-text available
This paper explores the use of purpose-built functions and cryptographic hashes of small data blocks for identifying data in sectors, file fragments, and entire files. It introduces and defines the concept of a “distinct” disk sector—a sector that is unlikely to exist elsewhere except as a copy of the original. Techniques are presented for improved...
Article
A proposal for improving the review procedures for research projects that involve human subjects and their associated identifiable private information.
Conference Paper
Full-text available
Abstract—Global analysis is a useful supplement to local forensic analysis of the details of files in a drive image. This paper reports on experiments with global methods to find time patterns associated with disks and files. The Real Disk Corpus of over 1000 drive images from eight countries was used as a corpus. The data was clustered into 63 sub...
Article
Today's Golden Age of computer forensics is quickly coming to an end. Without a clear strategy for enabling research efforts that build upon one another, forensic research will fall behind the market, tools will become increasingly obsolete, and law enforcement, military and other users of computer forensics products will be unable to rely on the r...
Article
Full-text available
Forensic analysis requires the acquisition and management of many different types of evidence, including individual disk drives, RAID sets, network packets, memory images, and extracted files. Often the same evidence is reviewed by several different tools or examiners in different locations. We propose a backwards-compatible redesign of the Advance...
Article
Full-text available
Progress in computer forensics research has been limited by the lack of a standardized data sets—corpora—that are available for research purposes. We explain why corpora are needed to further forensic research, present a taxonomy for describing corpora, and announce the availability of several forensic data sets.
Conference Paper
The Windows Vista personal firewall provides its diverse users with a basic interface that hides many operational details. However, concealing the impact of network context on the security state of the firewall may result in users developing an incorrect ...
Conference Paper
Full-text available
We have developed a program called fiwalk which produces detailed XML describing all of the partitions and files on a hard drive or diskimage, as well as any extractable metadata from the document files themselves. We show how it is relatively simple to create automated disk forensic applications using a Python module we have written that reads fiw...
Conference Paper
Full-text available
Increasingly advances in file carving, memory analysis and network forensics requires the ability to identify the underlying type of a file given only a file fragment. Work to date on this problem has relied on identification of specific byte sequences in file headers and footers, and the use of statistical analysis and machine learning algorithms...
Article
Full-text available
The forensic analysis of two rival XML-based office document file formats such as Office Open XML (OOX) and OpenDocument Format (ODF) is presented. The unique identifiers within the ODF and OOX files include 32-bit numbers stored in hexadecimal and 128-bit number unique of the particular generation of a document. The OOX documents include a store i...
Article
Full-text available
This article presents improvements in the Advanced Forensics Format Library version 3 that provide for digital signatures and other cryptographic protections for digital evidence, allowing an investigator to establish a reliable chain-of-custody for electronic evidence from the crime scene to the court room. No other system for handling and storing...
Article
Full-text available
When the CellBE processor was introduced, the Advanced Encryption Standard (AES) was one of the benchmarks; IBM published throughput speeds for different modes but gave no details on the precise implementation. Our team has developed AES independently. For ECB encryption our version is slightly faster than that of IBM; for CBC encryption our versio...
Article
With the move to "cloud" computing, archivists face the increasingly difficult task of finding and preserving the works of an originator so that they may be readily used by future historians. This paper explores the range of infor-mation that an originator may have left on computers "out there on the Internet," including works that are publicly ide...
Article
OMB No. 0704–0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estim...
Conference Paper
Full-text available
Much effort has been expended in recent years to cre- ate large sets of hash codes from known files. Distribut- ing these sets has become more difficult as these sets grow larger. Meanwhile the value of these sets for eliminating the need to analyze "known goods" has decreased as hard drives have dramatically increased in storage capacity. This pap...
Article
The process of collecting information from multiple resources and merging it, known as data fusion, is supposed to create an information resource that is more powerful, more flexible and more accurate. Data quality is one of the issues related with data fusion, as much of the information in databases was originally collected for statistical purpose...
Article
Full-text available
Amazon.com has introduced the Simple Storage Service (S3), a commodity-priced storage utility. S3 aims to provide storage as a low-cost, highly available service, with a simple 'pay-as-you-go' charging model. This article makes three contributions. First, we evaluate S3's ability to provide storage support to large-scale science projects from a cos...
Article
Full-text available
Simson Garfinkel reviews Security Data Visualization: Graphical Techniques for Network Analysis by Greg Conti.
Conference Paper
Full-text available
Having decided to focus attention on the "weak link" of human fallibility, a growing number of security re- searchers are discovering the US Government's regula- tions that govern human subject research. This paper dis- cusses those regulations, their application to research on security and usability, and presents strategies for negoti- ating the I...
Article
Full-text available
A computer used by Al Qaeda ends up in the hands of a Wall Street Journal reporter. A laptop from Iran is discovered that contains details of that country’s nuclear weapons program. Photographs and videos are downloaded from terrorist Web sites. As evidenced by these and countless other cases, digital documents and storage devices hold the key to m...
Article
Full-text available
“File carving” reconstructs files based on their content, rather than using metadata that points to the content. Carving is widely used for forensics and data recovery, but no file carvers can automatically reassemble fragmented files. We survey files from more than 300 hard drives acquired on the secondary market and show that the ability to reass...
Conference Paper
Full-text available
We present an integrated security model for a low-cost laptop that will be widely deployed throughout the developing world. Implemented on top of Linux operating system, the model is designed to restrict the laptop's software without restricting the laptop's user.
Article
Users are increasingly demanding two contradictory system properties - the ability to absolutely, positively erase information so that it cannot be recovered, and the ability to recover information that was inadvertently or intentionally altered or deleted. Storage system designers now need to resolve the tension between complete delete and time ma...
Article
Full-text available
Computer Forensic Tools (CFTs) allow investigators to recover deleted files, reconstruct an intruder's activities, and gain intelligence about a computer's user. Anti-Forensics (AF) tools and techniques frustrate CFTs by erasing or altering information; creating "chaff" that wastes time and hides information; implicating innocent parties by plantin...
Article
Full-text available
Research in the field of computer forensics is hobbled by the lack of realistic data. Academics are not developing automated techniques and tools because they lack the raw data necessary to develop and validate algorithms. Investigators that have access to real data operate under legal and practical restraints that prevent the data from being used...
Article
Full-text available
The guest editors discuss data surveillance. Proponents hope that data surveillance technology will be able to anticipate and prevent terrorist attacks, detect disease outbreaks, and allow for detailed social science research--all without the corresponding risks to personal privacy because machines, not people, perform the surveillance.
Article
Full-text available
This paper introduces Forensic Feature Extraction (FFE) and Cross-Drive Analysis (CDA), two new approaches for analyzing large data sets of disk images and other forensic data. FFE uses a variety of lexigraphic techniques for extracting information from bulk data; CDA uses statistical techniques for correlating this information within a single disk...
Conference Paper
Full-text available
Many of today’s privacy-preserving tools create a big file that fills up a hard drive or USB storage device in an effort to overwrite all of the “deleted files” that the media contain. But while this technique is widespread, it is largely unvalidated. We evaluate the effectiveness of the “big file technique” using sector-by-sector disk imaging on f...
Article
Full-text available
It is widely believed that security and usability are two antagonistic goals in system design. This thesis argues that there are many instances in which security and usability can be synergistically improved by revising the way that specific functionality is implemented in many of today's operating systems and applications. Specific design principl...
Conference Paper
Full-text available
Automatic provenance collection describes systems that ob- serve processes and data transformations inferring, collecting, and main- taining provenance about them. Automatic collection is a powerful tool for analysis of objects and processes, providing a level of transparency and pervasiveness not found in more conventional provenance systems. Unfo...
Conference Paper
Full-text available
Security toolbars in a web browser show security-related information about a website to help users detect phishing attacks. Because the toolbars are designed for humans to use, they should be evaluated for usability - that is, whether these toolbars really prevent users from being tricked into providing personal information. We conducted two user s...

Network

Cited By