Fig 2 - uploaded by Yuhei Kawakoya
Content may be subject to copyright.
Source publication
We propose a design and implementation for an Application Programming Interface (API) monitoring system called API Chaser, which is resistant to evasion-type anti-analysis techniques, e.g., stolen code and code injection. The core technique in API Chaser is code tainting, which enables us to identify precisely the execution of monitored instruction...
Contexts in source publication
Context 1
... use code tainting with the three types of taint tags to monitor APIs invoked from malware. When a CPU fetches an instruction and the instruction has an api-tag, it confirms the taint tag attached to the caller instruction. There are three cases as shown in internal of other APIs (nested call). As for the first case, shown in Fig. 2 (1), if the caller instruction has a malware-tag, it determines that the API call is from malware. Thus, it captures the API call and collects the information related to the API call such as its arguments. With regard to the second, shown in Fig. 2 (2), if caller one has a benign-tag, it determines that the API call is from a benign ...
Context 2
... are three cases as shown in internal of other APIs (nested call). As for the first case, shown in Fig. 2 (1), if the caller instruction has a malware-tag, it determines that the API call is from malware. Thus, it captures the API call and collects the information related to the API call such as its arguments. With regard to the second, shown in Fig. 2 (2), if caller one has a benign-tag, it determines that the API call is from a benign process. Thus, it is outside the target monitoring and does not need to capture this API call. As for the third, shown in Fig. 2 (3), if the caller has an api-tag, it is a nested API call. Nested API calls are also excluded from the monitoring target, so ...
Context 3
... captures the API call and collects the information related to the API call such as its arguments. With regard to the second, shown in Fig. 2 (2), if caller one has a benign-tag, it determines that the API call is from a benign process. Thus, it is outside the target monitoring and does not need to capture this API call. As for the third, shown in Fig. 2 (3), if the caller has an api-tag, it is a nested API call. Nested API calls are also excluded from the monitoring target, so that we can focus only on API calls directly invoked from malware. This makes the behaviors of malware clearer and easier to ...
Context 4
... use code tainting with the three types of taint tags to monitor APIs invoked from malware. When a CPU fetches an instruction and the instruction has an api-tag, it confirms the taint tag attached to the caller instruction. There are three cases as shown in internal of other APIs (nested call). As for the first case, shown in Fig. 2 (1), if the caller instruction has a malware-tag, it determines that the API call is from malware. Thus, it captures the API call and collects the information related to the API call such as its arguments. With regard to the second, shown in Fig. 2 (2), if caller one has a benign-tag, it determines that the API call is from a benign ...
Context 5
... are three cases as shown in internal of other APIs (nested call). As for the first case, shown in Fig. 2 (1), if the caller instruction has a malware-tag, it determines that the API call is from malware. Thus, it captures the API call and collects the information related to the API call such as its arguments. With regard to the second, shown in Fig. 2 (2), if caller one has a benign-tag, it determines that the API call is from a benign process. Thus, it is outside the target monitoring and does not need to capture this API call. As for the third, shown in Fig. 2 (3), if the caller has an api-tag, it is a nested API call. Nested API calls are also excluded from the monitoring target, so ...
Context 6
... captures the API call and collects the information related to the API call such as its arguments. With regard to the second, shown in Fig. 2 (2), if caller one has a benign-tag, it determines that the API call is from a benign process. Thus, it is outside the target monitoring and does not need to capture this API call. As for the third, shown in Fig. 2 (3), if the caller has an api-tag, it is a nested API call. Nested API calls are also excluded from the monitoring target, so that we can focus only on API calls directly invoked from malware. This makes the behaviors of malware clearer and easier to ...
Similar publications
In previous work, “gist descriptor” features extracted from images have been used in malware classification problems and have shown promising results. In this research, we determine whether gist descriptors are robust with respect to malware obfuscation techniques, as compared to Convolutional Neural Networks (CNN) trained directly on malware image...
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling...
The emergence of Internet of Things malware, which leverages exploited IoT devices to perform large-scale cyber attacks (e.g., Mirai botnet), is considered as a major threat to the Internet ecosystem. To mitigate such threat, there is an utmost need for effective IoT malware classification and family attri-bution, which provide essential steps towa...
With macOS increasing popularity, the number, and variety of macOS malware are rising as well. Yet, very few tools exist for dynamic analysis of macOS malware. In this paper, we propose a macOS malware analysis framework called Mac-A-Mal. We develop a kernel extension to monitor malware behavior and mitigate several anti-evasion techniques used in...
Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system,...
Citations
... Yuhei et al. [23], [24] proposed a code tainting techniquesbased analyzer, API Chaser, to identify the execution of monitored instructions. API Chaser gave different taint tags to the API, benign, and malware samples. ...
To dynamically identify malicious behaviors of millions of Windows malware, anti-virus vendors have widely been using sandbox-based analyzers. However, the sandbox-based analysis has a critical limitation that anti-analysis techniques (i.e., Anti-sandbox and Anti-VM techniques) can easily detect analyzers and evade from being analyzed. In this work, we study on anti-analysis techniques used in real-world malware. First off, to measure how many Windows malware exhibits anti-analysis techniques, we collect anti-analysis techniques used in malware. We, then, design and implement an automated system, named EvDetector, that detects malware which employ anti-analysis techniques. EvDetector finds if malware uses an anti-analysis technique and monitors whether the malware changes its execution paths based on the result of the anti-analysis technique. By using EvDetector, we analyzed 763,985 real-world malware that emerged from 2017 to 2020. Our evaluation results show that 16.21% of malware use anti-analysis techniques on average. Also, we check the effectiveness of the analysis result by comparing EvDetector and static analysis. EvDetector analyzes up to 49.88% of malware detected by static analysis did not use anti-analysis techniques. In addition, we analyze that only up to 3.75% of the packed malware used anti-analysis techniques. Finally, we analyze the evasive malware trend through familial analysis and behavioral analysis. Our work implies that the research community needs to put more effort on defeating such anti-analysis techniques to automatically analyze emerging malware and respond with them.
... Some were explicitly designed to improve or propose new sandbox techniques, while others simply relied on sandboxes to collect data to perform other experiments -such as modeling the behavior of samples, extracting new detection signatures, train a classifier, or report on the internals of certain malware characteristics (such as packing, use of encryption, etc.). [21], [26], [35], [40], [43], [52], [70], [85], [91], [95], [100], [101], [105], [110] 3 7 [15], [65]- [67], [71], [81], [93] 4 1 [58] 5 13 [12], [13], [23], [29], [38], [50], [51], [69], [78], [88], [92], [102], [106] 8 2 [20], [89] 10 7 [24], [25], [36], [41], [59], [63], [87] 15 1 [10] > 15 2 [82], [84] We reviewed several papers by looking at the execution time threshold used by their authors, and report the results in Table I, grouped in different time ranges. The values range from a minimum of 30 seconds [44]- [46], [94], [109] to a maximum of 1h [82]. ...
... For example, when a compromised benign process accesses a sensitive file, the kernel-level provenance will record the file access activity. OS kernel supports data collection for provenance analysis incurring only a reasonable amount of overhead when it is compared to heavy-weight dynamic analyses such as virtual machine (VM) assistedinstrumentation or sandbox execution [66], [62]. ...
... When the malware detects that it is being run in a virtual machine or under a debugger, it changes its behavior (usually either less malicious behavior or termination). PROVDETECTOR, unlike virtualization based solutions [66], [62], is designed to run on bare metal machines and does not require isolated environments. Similar to previous work [27], [26], [62], to perform a large-scale analysis, we use sandbox environments to automate the execution of malware samples in our evaluation. ...
... PROVDETECTOR, unlike virtualization based solutions [66], [62], is designed to run on bare metal machines and does not require isolated environments. Similar to previous work [27], [26], [62], to perform a large-scale analysis, we use sandbox environments to automate the execution of malware samples in our evaluation. It is possible that some anti-analysis malware changed their behavior during our evaluation. ...
To subvert recent advances in perimeter and host security, the attacker community has developed and employed various attack vectors to make a malware much more stealthy than before to penetrate the target system and prolong its presence. The advanced malware, or stealthy malware, impersonates or abuses benign applications and legitimate system tools to minimize its footprints in the target system. One example of such stealthy malware is fileless malware, which resides its malicious logic mostly in the memory of well-trusted processes. It is difficult for traditional detection tools, such as malware scanners, to detect it, as the malware normally does not expose its malicious payload in a file and hides its malicious behaviors among the benign behaviors of the processes.
In this paper, we present PROVDETECTOR, a provenance-based approach for detecting stealthy malware. The intuition behind PROVDETECTOR is that although a stealthy malware may impersonate or abuse a benign process, it still exposes its malicious behaviors in the OS (operating system) level provenance. Based on this intuition, PROVDETECTOR first employs a novel selection algorithm to identify possibly malicious parts in the OS level provenance data of a process. Then, it applies a neural embedding and machine learning pipeline to automatically detect any behavior that deviates significantly from normal behaviors. We evaluate our approach on a large provenance dataset from an enterprise network and demonstrate that it achieves very high detection performance (an average F1 score of 0.974) of stealthy malware. Further, we conduct thorough interpretability studies to understand the internals of the learned machine learning models.
... Several works have closely considered the concept of malware executing throughout the whole system. In particular, Panorama [43], DiskDuster [1], Tartarus [29] and API Chaser [25] use dynamic taint analysis to capture this. Barabosch et al. has also investigated the problem with code injection by analysing memory dumps [4] and also at run time [5]. ...
Run time packing is a common approach malware use to obfuscate their payloads, and automatic unpacking is, therefore, highly relevant. The problem has received much attention, and so far, solutions based on dynamic analysis have been the most successful. Nevertheless, existing solutions lack in several areas, both conceptually and architecturally, because they focus on a limited part of the unpacking problem. These limitations significantly impact their applicability, and current unpackers have, therefore, experienced limited adoption. In this paper, we introduce a new tool, called Minerva, for effective automatic unpacking of malware samples. Minerva introduces a unified approach to precisely uncover execution waves in a packed malware sample and produce PE files that are well-suited for follow-up static analysis. At the core, Minerva deploys a novel information flow model of system-wide dynamically generated code, precise collection of API calls and a new approach for merging execution waves and API calls. Together, these novelties amplify the generality and precision of automatic unpacking and make the output of Minerva highly usable. We extensively evaluate Minerva against synthetic and real-world malware samples and show that our techniques significantly improve on several aspects compared to previous work.
The propagation of code from one process to another is an important aspect of many malware families and can be achieved, for example, through code injections or the launch of new instances. An in-depth understanding of how and when malware uses interprocess code propagations would be a valuable aid in the analysis of this threat, since many dynamic malware analysis and unpacking schemes rely on finding running instances of malicious code. However, despite the prevalence of such propagations, there is little research on this topic. Therefore, in this work, we aim to extend the state-of-the-art by measuring both the behavior and the prevalence of interprocess code propagations of malicious software. We developed a method based on API-tracing for measuring code propagations in dynamic malware analysis. Subsequently, we implemented this method into a proof-of-concept implementation as a basis for further research. To gain more knowledge on the prevalence of code propagations and the code propagation techniques used, we conducted a study using our implementation on a real-world data set of 4853 malware samples from 1747 families. Our results show that more than a third (38.13%) of the executables use code propagation, which can be further classified into four different topologies and 24 different code propagation techniques. We also provide a list of the most significant representative malware samples for each of these topologies and techniques as a starting point for researchers aiming to develop countermeasures against code propagation.
The presence of packing techniques in malicious software remains a significant obstacle in malware analysis. Consequently, numerous research efforts have emerged with the objective of developing a generic methodology to unpack malware. However, these unpacking methodologies often rely on assumptions about the capabilities of packers. These assumptions include factors such as the origin of memory sources, code-writing techniques used to fulfill packing capabilities, the number of packing layers used, the persistence of code within memory, and the clear distinction between packer and malware code. In our paper, we aim to advance the state-of-the-art by addressing these underlying assumptions associated with malware unpacking. Based on these assumptions, we formulate five research questions to be addressed in a study on the packer capabilities found in a real-world Windows malware and clinical data set consisting of off-the-shelf packers. The answers deduced from our study demonstrate that the majority of common generic unpacking methodologies in the literature show significant blind spots, with the notable exception of the Renovo methodology and its derivatives.
Foretelling ongoing malware attacks in real time is challenging due to the stealthy and polymorphic nature of their executive behavior patterns. In this paper, we present MalAF, a novel
Mal
ware
A
ttack
F
oretelling framework that utilizes run-time behavior (i.e., sequences of API events) of malware to foretell the attack that has not yet executed. MalAF first samples suspicious API events by assessing the sensitivity of the parameters of each API event and dividing them into multiple attack time slots by calculating the strong correlation. Following that, MalAF employs dynamic heterogeneous graph sequences to incrementally model contextual semantics for each attack time slot, generating malware state sequences in real time. Moreover, MalAF proposes a greedy adaptive dictionary (GAD)-optimized IRL preference learning method to automate the capture of families' intrinsic attack preferences, which achieves higher performance than the existing inverse reinforcement learning (IRL). Additionally, with the guidance of families' attack preferences, MalAF trains an LSTM to foretell the future path of the target malware. Finally, MalAF matches the identified APIs' paths with a malicious capability base and reports the comprehensible attacks to an analyst. The experiments on real-world datasets demonstrate that our proposed MalAF outperforms the state-of-the-art methods, which improves the baseline by 3.01%
4.73% of accuracy in terms of path foretell.