Fig 5 - uploaded by Yuhei Kawakoya
Content may be subject to copyright.
Code taint propagation example.

Code taint propagation example.

Source publication
Article
Full-text available
We propose a design and implementation for an Application Programming Interface (API) monitoring system called API Chaser, which is resistant to evasion-type anti-analysis techniques, e.g., stolen code and code injection. The core technique in API Chaser is code tainting, which enables us to identify precisely the execution of monitored instruction...

Contexts in source publication

Context 1
... Rule3: If an instruction calling an API is tainted with a malware-tag, the taint tag of the instruction, i.e., malwaretag, is added to the written data by the API. The bottom-left pseudocode in Fig. 5 is an example of Rule1 and Rule2, illustrating the case of mov [edi], eax. If the source operand of the target instruction, eax, does not have any taint tags and the opcode, mov, has a malware-tag, we add malware-tags to the destination operand, [edi]. Consequently, it appears as if it propagates taint tags of opcode to the destination ...
Context 2
... Rule3: If an instruction calling an API is tainted with a malware-tag, the taint tag of the instruction, i.e., malwaretag, is added to the written data by the API. The bottom-left pseudocode in Fig. 5 is an example of Rule1 and Rule2, illustrating the case of mov [edi], eax. If the source operand of the target instruction, eax, does not have any taint tags and the opcode, mov, has a malware-tag, we add malware-tags to the destination operand, [edi]. Consequently, it appears as if it propagates taint tags of opcode to the destination ...

Similar publications

Chapter
Full-text available
In previous work, “gist descriptor” features extracted from images have been used in malware classification problems and have shown promising results. In this research, we determine whether gist descriptors are robust with respect to malware obfuscation techniques, as compared to Convolutional Neural Networks (CNN) trained directly on malware image...
Preprint
Full-text available
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling...
Article
Full-text available
The emergence of Internet of Things malware, which leverages exploited IoT devices to perform large-scale cyber attacks (e.g., Mirai botnet), is considered as a major threat to the Internet ecosystem. To mitigate such threat, there is an utmost need for effective IoT malware classification and family attri-bution, which provide essential steps towa...
Article
Full-text available
With macOS increasing popularity, the number, and variety of macOS malware are rising as well. Yet, very few tools exist for dynamic analysis of macOS malware. In this paper, we propose a macOS malware analysis framework called Mac-A-Mal. We develop a kernel extension to monitor malware behavior and mitigate several anti-evasion techniques used in...
Preprint
Full-text available
Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system,...

Citations

... Yuhei et al. [23], [24] proposed a code tainting techniquesbased analyzer, API Chaser, to identify the execution of monitored instructions. API Chaser gave different taint tags to the API, benign, and malware samples. ...
Article
Full-text available
To dynamically identify malicious behaviors of millions of Windows malware, anti-virus vendors have widely been using sandbox-based analyzers. However, the sandbox-based analysis has a critical limitation that anti-analysis techniques (i.e., Anti-sandbox and Anti-VM techniques) can easily detect analyzers and evade from being analyzed. In this work, we study on anti-analysis techniques used in real-world malware. First off, to measure how many Windows malware exhibits anti-analysis techniques, we collect anti-analysis techniques used in malware. We, then, design and implement an automated system, named EvDetector, that detects malware which employ anti-analysis techniques. EvDetector finds if malware uses an anti-analysis technique and monitors whether the malware changes its execution paths based on the result of the anti-analysis technique. By using EvDetector, we analyzed 763,985 real-world malware that emerged from 2017 to 2020. Our evaluation results show that 16.21% of malware use anti-analysis techniques on average. Also, we check the effectiveness of the analysis result by comparing EvDetector and static analysis. EvDetector analyzes up to 49.88% of malware detected by static analysis did not use anti-analysis techniques. In addition, we analyze that only up to 3.75% of the packed malware used anti-analysis techniques. Finally, we analyze the evasive malware trend through familial analysis and behavioral analysis. Our work implies that the research community needs to put more effort on defeating such anti-analysis techniques to automatically analyze emerging malware and respond with them.
... Some were explicitly designed to improve or propose new sandbox techniques, while others simply relied on sandboxes to collect data to perform other experiments -such as modeling the behavior of samples, extracting new detection signatures, train a classifier, or report on the internals of certain malware characteristics (such as packing, use of encryption, etc.). [21], [26], [35], [40], [43], [52], [70], [85], [91], [95], [100], [101], [105], [110] 3 7 [15], [65]- [67], [71], [81], [93] 4 1 [58] 5 13 [12], [13], [23], [29], [38], [50], [51], [69], [78], [88], [92], [102], [106] 8 2 [20], [89] 10 7 [24], [25], [36], [41], [59], [63], [87] 15 1 [10] > 15 2 [82], [84] We reviewed several papers by looking at the execution time threshold used by their authors, and report the results in Table I, grouped in different time ranges. The values range from a minimum of 30 seconds [44]- [46], [94], [109] to a maximum of 1h [82]. ...
... For example, when a compromised benign process accesses a sensitive file, the kernel-level provenance will record the file access activity. OS kernel supports data collection for provenance analysis incurring only a reasonable amount of overhead when it is compared to heavy-weight dynamic analyses such as virtual machine (VM) assistedinstrumentation or sandbox execution [66], [62]. ...
... When the malware detects that it is being run in a virtual machine or under a debugger, it changes its behavior (usually either less malicious behavior or termination). PROVDETECTOR, unlike virtualization based solutions [66], [62], is designed to run on bare metal machines and does not require isolated environments. Similar to previous work [27], [26], [62], to perform a large-scale analysis, we use sandbox environments to automate the execution of malware samples in our evaluation. ...
... PROVDETECTOR, unlike virtualization based solutions [66], [62], is designed to run on bare metal machines and does not require isolated environments. Similar to previous work [27], [26], [62], to perform a large-scale analysis, we use sandbox environments to automate the execution of malware samples in our evaluation. It is possible that some anti-analysis malware changed their behavior during our evaluation. ...
Conference Paper
Full-text available
To subvert recent advances in perimeter and host security, the attacker community has developed and employed various attack vectors to make a malware much more stealthy than before to penetrate the target system and prolong its presence. The advanced malware, or stealthy malware, impersonates or abuses benign applications and legitimate system tools to minimize its footprints in the target system. One example of such stealthy malware is fileless malware, which resides its malicious logic mostly in the memory of well-trusted processes. It is difficult for traditional detection tools, such as malware scanners, to detect it, as the malware normally does not expose its malicious payload in a file and hides its malicious behaviors among the benign behaviors of the processes. In this paper, we present PROVDETECTOR, a provenance-based approach for detecting stealthy malware. The intuition behind PROVDETECTOR is that although a stealthy malware may impersonate or abuse a benign process, it still exposes its malicious behaviors in the OS (operating system) level provenance. Based on this intuition, PROVDETECTOR first employs a novel selection algorithm to identify possibly malicious parts in the OS level provenance data of a process. Then, it applies a neural embedding and machine learning pipeline to automatically detect any behavior that deviates significantly from normal behaviors. We evaluate our approach on a large provenance dataset from an enterprise network and demonstrate that it achieves very high detection performance (an average F1 score of 0.974) of stealthy malware. Further, we conduct thorough interpretability studies to understand the internals of the learned machine learning models.
... Several works have closely considered the concept of malware executing throughout the whole system. In particular, Panorama [43], DiskDuster [1], Tartarus [29] and API Chaser [25] use dynamic taint analysis to capture this. Barabosch et al. has also investigated the problem with code injection by analysing memory dumps [4] and also at run time [5]. ...
Preprint
Run time packing is a common approach malware use to obfuscate their payloads, and automatic unpacking is, therefore, highly relevant. The problem has received much attention, and so far, solutions based on dynamic analysis have been the most successful. Nevertheless, existing solutions lack in several areas, both conceptually and architecturally, because they focus on a limited part of the unpacking problem. These limitations significantly impact their applicability, and current unpackers have, therefore, experienced limited adoption. In this paper, we introduce a new tool, called Minerva, for effective automatic unpacking of malware samples. Minerva introduces a unified approach to precisely uncover execution waves in a packed malware sample and produce PE files that are well-suited for follow-up static analysis. At the core, Minerva deploys a novel information flow model of system-wide dynamically generated code, precise collection of API calls and a new approach for merging execution waves and API calls. Together, these novelties amplify the generality and precision of automatic unpacking and make the output of Minerva highly usable. We extensively evaluate Minerva against synthetic and real-world malware samples and show that our techniques significantly improve on several aspects compared to previous work.
Conference Paper
The propagation of code from one process to another is an important aspect of many malware families and can be achieved, for example, through code injections or the launch of new instances. An in-depth understanding of how and when malware uses interprocess code propagations would be a valuable aid in the analysis of this threat, since many dynamic malware analysis and unpacking schemes rely on finding running instances of malicious code. However, despite the prevalence of such propagations, there is little research on this topic. Therefore, in this work, we aim to extend the state-of-the-art by measuring both the behavior and the prevalence of interprocess code propagations of malicious software. We developed a method based on API-tracing for measuring code propagations in dynamic malware analysis. Subsequently, we implemented this method into a proof-of-concept implementation as a basis for further research. To gain more knowledge on the prevalence of code propagations and the code propagation techniques used, we conducted a study using our implementation on a real-world data set of 4853 malware samples from 1747 families. Our results show that more than a third (38.13%) of the executables use code propagation, which can be further classified into four different topologies and 24 different code propagation techniques. We also provide a list of the most significant representative malware samples for each of these topologies and techniques as a starting point for researchers aiming to develop countermeasures against code propagation.
Chapter
The presence of packing techniques in malicious software remains a significant obstacle in malware analysis. Consequently, numerous research efforts have emerged with the objective of developing a generic methodology to unpack malware. However, these unpacking methodologies often rely on assumptions about the capabilities of packers. These assumptions include factors such as the origin of memory sources, code-writing techniques used to fulfill packing capabilities, the number of packing layers used, the persistence of code within memory, and the clear distinction between packer and malware code. In our paper, we aim to advance the state-of-the-art by addressing these underlying assumptions associated with malware unpacking. Based on these assumptions, we formulate five research questions to be addressed in a study on the packer capabilities found in a real-world Windows malware and clinical data set consisting of off-the-shelf packers. The answers deduced from our study demonstrate that the majority of common generic unpacking methodologies in the literature show significant blind spots, with the notable exception of the Renovo methodology and its derivatives.
Article
Foretelling ongoing malware attacks in real time is challenging due to the stealthy and polymorphic nature of their executive behavior patterns. In this paper, we present MalAF, a novel Mal ware A ttack F oretelling framework that utilizes run-time behavior (i.e., sequences of API events) of malware to foretell the attack that has not yet executed. MalAF first samples suspicious API events by assessing the sensitivity of the parameters of each API event and dividing them into multiple attack time slots by calculating the strong correlation. Following that, MalAF employs dynamic heterogeneous graph sequences to incrementally model contextual semantics for each attack time slot, generating malware state sequences in real time. Moreover, MalAF proposes a greedy adaptive dictionary (GAD)-optimized IRL preference learning method to automate the capture of families' intrinsic attack preferences, which achieves higher performance than the existing inverse reinforcement learning (IRL). Additionally, with the guidance of families' attack preferences, MalAF trains an LSTM to foretell the future path of the target malware. Finally, MalAF matches the identified APIs' paths with a malicious capability base and reports the comprehensible attacks to an analyst. The experiments on real-world datasets demonstrate that our proposed MalAF outperforms the state-of-the-art methods, which improves the baseline by 3.01% \sim 4.73% of accuracy in terms of path foretell.