An Efficient Piecewise Hashing Method for Computer Forensics

Conference Paper · February 2008with41 Reads
DOI: 10.1109/WKDD.2008.80 · Source: IEEE Xplore
Conference: Knowledge Discovery and Data Mining, 2008. WKDD 2008. International Workshop on
Abstract
Hashing, a basic tool in computer forensics, is used to ensure data integrity and to identify known data objects efficiently. Unfortunately, intentional tiny modified file can not be identified using this traditional technique. Context triggered piecewise hashing separates a file into pieces using local context characteristic, and produces a hash sequence as a hash signature. The hash signature can be used to identify similar files with tiny modifications such as insertion, replacement and deletion. The algorithm of currently available scheme is designed for junk mail detection, which is low efficient and not suitable for file system investigation. In this paper, an improved algorithm based on the Store-Hash and Rehash idea is developed for context triggered piecewise hashing technique. Experiment results show that the performance of speed and the ability of similarity detection of the new scheme are better than that of spamsum. It is valuable for forensics practice.
    • "The function computes a hash value (e.g., cryptographic hash) for the individual split-pieces and concatenates them into a final fingerprint string. ssdeep [18], FKSum [9], and SimFD [24] belong to this category. Block-based hashing (BBH): This category of fuzzy hash functions generate one small block of the final fingerprint after a certain amount of input has been processed. "
    [Show abstract] [Hide abstract] ABSTRACT: Malware triaging is the process of analyzing malicious software applications’ behavior to develop detection signatures. This task is challenging, especially due to the enormous number of samples received by the vendors with limited amount of analyst time. Triaging usually starts with an analyst classifying samples into known and unknown malware. Recently, there have been various attempts to automate the process of grouping similar malware using a technique called fuzzy hashing – a type of compression functions for computing the similarity between individual digital files. Unfortunately, there has been no rigorous experimentation or evaluation of fuzzy hashing algorithms for malware similarity analysis in the research literature. In this paper, we perform extensive study of existing fuzzy hashing algorithms with the goal of understanding their applicability in clustering similar malware. Our experiments indicate that current popular fuzzy hashing algorithms suffer from serious limitations that preclude them from being used in similarity analysis. We identified novel ways to construct fuzzy hashing algorithms and experiments show that our algorithms have better performance than existing algorithms.
    Full-text · Conference Paper · Aug 2015
    • "Encryption guarantees that only authorized people can use the data. Moreover, assuring confidentiality can include: invoking file permissions and granting a secure operating environment, while cryptographic hashing of datasets can assure the integrity property of students' records (Chen & Wang, 2008). Accuracy: As Learning Analytics is an emerging research topic in the field of Technology Enhanced Learning and a forthcoming trend (Ebner & Schön, 2013), accuracy and validity of information is highly questionable. "
    [Show abstract] [Hide abstract] ABSTRACT: Within the evolution of technology in education, Learning Analytics has reserved its position as a robust technological field that promises to empower instructors and learners in different educational fields. The 2014 horizon report (Johnson et al., 2014), expects it to be adopted by educational institutions in the near future. However, the processes and phases as well as constraints are still not deeply debated. In this research study, the authors talk about the essence, objectives and methodologies of Learning Analytics and propose a first prototype life cycle that describes its entire process. Furthermore, the authors raise substantial questions related to challenges such as security, policy and ethics issues that limit the beneficial appliances of Learning Analytics processes.
    Full-text · Conference Paper · Jun 2015
    • "The DMS model is simple to use and can be relatively easily integrated in Web servers and browsers. It is also general in the sense that its main idea with small modifications can be used in other problem domains such as routing, load balancing, and computer forensics [10]. Obvious subject of future work is a software implementation of DMS and evaluation of its efficiency in production environments. "
    [Show abstract] [Hide abstract] ABSTRACT: In this paper we consider the problem of improving Web performance and propose an efficient differencing and merging system (DMS) based on an HTTP protocol extension. To provide for faster information exchange over the Web, the system tries to transfer only computed differences between requested documents and previously retrieved documents from the same site. Analysis and experimental results prove the effectiveness of DMS, but also show bigger processor and memory load on servers and clients. DMS is compatible with most of the existing solutions for improving Web performance. Moreover, SSL security system may be used to provide Web privacy and authenticity. The DMS model is simple to use and can be relatively easily integrated in Web servers and browsers.
    Full-text · Article · Jan 2012
Show more