Conference Paper

An Efficient Piecewise Hashing Method for Computer Forensics

SouthWest JiaoTong Univ., Chengdu
DOI: 10.1109/WKDD.2008.80 Conference: Knowledge Discovery and Data Mining, 2008. WKDD 2008. International Workshop on
Source: IEEE Xplore


Hashing, a basic tool in computer forensics, is used to ensure data integrity and to identify known data objects efficiently. Unfortunately, intentional tiny modified file can not be identified using this traditional technique. Context triggered piecewise hashing separates a file into pieces using local context characteristic, and produces a hash sequence as a hash signature. The hash signature can be used to identify similar files with tiny modifications such as insertion, replacement and deletion. The algorithm of currently available scheme is designed for junk mail detection, which is low efficient and not suitable for file system investigation. In this paper, an improved algorithm based on the Store-Hash and Rehash idea is developed for context triggered piecewise hashing technique. Experiment results show that the performance of speed and the ability of similarity detection of the new scheme are better than that of spamsum. It is valuable for forensics practice.

32 Reads
  • Source
    • "The function computes a hash value (e.g., cryptographic hash) for the individual split-pieces and concatenates them into a final fingerprint string. ssdeep [18], FKSum [9], and SimFD [24] belong to this category. Block-based hashing (BBH): This category of fuzzy hash functions generate one small block of the final fingerprint after a certain amount of input has been processed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Malware triaging is the process of analyzing malicious software applications’ behavior to develop detection signatures. This task is challenging, especially due to the enormous number of samples received by the vendors with limited amount of analyst time. Triaging usually starts with an analyst classifying samples into known and unknown malware. Recently, there have been various attempts to automate the process of grouping similar malware using a technique called fuzzy hashing – a type of compression functions for computing the similarity between individual digital files. Unfortunately, there has been no rigorous experimentation or evaluation of fuzzy hashing algorithms for malware similarity analysis in the research literature. In this paper, we perform extensive study of existing fuzzy hashing algorithms with the goal of understanding their applicability in clustering similar malware. Our experiments indicate that current popular fuzzy hashing algorithms suffer from serious limitations that preclude them from being used in similarity analysis. We identified novel ways to construct fuzzy hashing algorithms and experiments show that our algorithms have better performance than existing algorithms.
    Full-text · Conference Paper · Aug 2015
  • Source
    • "The DMS model is simple to use and can be relatively easily integrated in Web servers and browsers. It is also general in the sense that its main idea with small modifications can be used in other problem domains such as routing, load balancing, and computer forensics [10]. Obvious subject of future work is a software implementation of DMS and evaluation of its efficiency in production environments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we consider the problem of improving Web performance and propose an efficient differencing and merging system (DMS) based on an HTTP protocol extension. To provide for faster information exchange over the Web, the system tries to transfer only computed differences between requested documents and previously retrieved documents from the same site. Analysis and experimental results prove the effectiveness of DMS, but also show bigger processor and memory load on servers and clients. DMS is compatible with most of the existing solutions for improving Web performance. Moreover, SSL security system may be used to provide Web privacy and authenticity. The DMS model is simple to use and can be relatively easily integrated in Web servers and browsers.
    Full-text · Article · Jan 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash func-tion like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify sim-ilar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algo-rithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach.
    Preview · Article · Jan 2012
Show more