Hashing, a basic tool in computer forensics, is used to ensure data integrity and to identify known data objects efficiently. Unfortunately, intentional tiny modified file can not be identified using this traditional technique. Context triggered piecewise hashing separates a file into pieces using local context characteristic, and produces a hash sequence as a hash signature. The hash signature can be used to identify similar files with tiny modifications such as insertion, replacement and deletion. The algorithm of currently available scheme is designed for junk mail detection, which is low efficient and not suitable for file system investigation. In this paper, an improved algorithm based on the Store-Hash and Rehash idea is developed for context triggered piecewise hashing technique. Experiment results show that the performance of speed and the ability of similarity detection of the new scheme are better than that of spamsum. It is valuable for forensics practice.
Knowledge Discovery and Data Mining, 2008. WKDD 2008. International Workshop on; 02/2008