Taint-assisted IAT Reconstruction against Position Obfuscation
Abstract and Figures
Windows Application Programming Interface (API) is an important data source for analysts to effectively understand the functions of malware. Due to this, malware authors are likely to hide the imported APIs in their malware by taking advantage of various obfuscation techniques. In this paper, we first build a formal model of the Import Address Table (IAT) reconstruction procedure to keep our description independent of specific implementations and then formally point out that the current IAT reconstruction is vulnerable to position obfuscation techniques, which are anti-analysis techniques obfuscating the positions of loaded APIs or Dynamic Link Libraries (DLLs). Next, we introduce an approach for API name resolution, which is an essential step in IAT reconstruction, on the basis of taint analysis to defeat position obfuscation techniques. The key idea of our approach is that we first define taint tags, each of which has a unique value for each API, apply the taint of the API to each of its instructions, track the movement of the API instructions by propagating the tags, and then resolve API names from the propagated tags for IAT reconstruction after acquiring a memory dump of the process under analysis. Finally, we experimentally demonstrate that a system in which our proposed API name resolution has been implemented enables us to correctly identify imported APIs even when malware authors apply various position obfuscation techniques to their malware.
Figures - uploaded by Yuhei Kawakoya
Author content
All figure content in this area was uploaded by Yuhei Kawakoya
Content may be subject to copyright.
... Call site monitoring BinUnpack [7], SOK [8], Scylla [9], Eureka [10], RePEc [11], PinDemonium [12], Arancino [13], Arg Prediction [14] Position monitoring API Chaser [15], QuietRIATT [16], Secure unpack [17], Taint-assisted [18] Hybrid monitoring API-Xray [4], RePEconstruct [19] Call site monitoring: Figure 2 depicts that the deobfuscation techniques for API call site monitoring follow two steps: (1) Instruction scanning (I in Figure 2), which runs PE files to find possible API call sites in memory, including indirect calls, direct calls, or indirect jumps. (2) Address association (II in Figure 2), which correlates the destination address of a possible call site with the exported API address of the loaded dynamic link library. ...
... API hook monitoring (III in Figure 2) monitors the execution of API code in the DLL. QuietRIATT and secure unpack use the API hook approach to set hooks in the loaded API code area, which is logged when the program calls the API where the hooks are set; alternatively, the taint analysis association [17,18] (IV in Figure 2) method involves attaching taint tags to the API code, and when the program executes the code in the DLL space, the API executed is determined by the attached taint tags. ...
... ation [17,18] (IV in Figure 2) method involves attaching taint tags to the API code, and when the program executes the code in the DLL space, the API executed is determined by the attached taint tags. ...
API calls are programming interfaces used by applications. When it is difficult for an analyst to perform a direct reverse analysis of a program, the API provides an important basis for analyzing the behavior and functionality of the program. API address spaces are essential for analysts to identify API call information, and therefore API call obfuscation is used as a protection strategy to prevent analysts from obtaining call information from API address spaces. API call obfuscation avoids direct API calls and aims to create a more complex API calling process. Unfortunately, current API call obfuscation methods are not effective in preventing analysts from obtaining usable information from the API address space. To solve this issue, in this paper, we propose an API call obfuscation model based on address space obscurity. The key functions within the API are encrypted and moved to the user code space for execution. This breaks the relationship between the API and its address space, making it impossible for analysts to obtain address information about a known API from the API address space. In our experiments, we developed an archetypical compiler-level API call obfuscation system to automate the obfuscation of input source code into an obfuscated file. The results show that our approach can thwart existing API deobfuscation techniques and is highly resistant to various open-source dynamic analysis platforms. Compared to other obfuscation techniques, our scheme improves API address space obscurity by more than two times, the detection rate of deobfuscation techniques such as Scylla, etc. is zero, and the increase in obfuscation overhead is not more than 20%. The above results show that APIASO has better obfuscation effect and practicability.
... ere are many works in automatic unpacking of malware and we have already discussed several of these throughout the paper [10,17,21,23,33,39,42]. Some of this work considers the concept of IAT destruction [21,28,39] and IAT reconstruction has also been considered on a more general basis [24]. e work by Ugarte et al. [42] highlights several missing gaps in existing unpackers and proposes a system-wide approach to unpacking. ...
Run time packing is a common approach malware use to obfuscate their payloads, and automatic unpacking is, therefore, highly relevant. The problem has received much attention, and so far, solutions based on dynamic analysis have been the most successful. Nevertheless, existing solutions lack in several areas, both conceptually and architecturally, because they focus on a limited part of the unpacking problem. These limitations significantly impact their applicability, and current unpackers have, therefore, experienced limited adoption. In this paper, we introduce a new tool, called Minerva, for effective automatic unpacking of malware samples. Minerva introduces a unified approach to precisely uncover execution waves in a packed malware sample and produce PE files that are well-suited for follow-up static analysis. At the core, Minerva deploys a novel information flow model of system-wide dynamically generated code, precise collection of API calls and a new approach for merging execution waves and API calls. Together, these novelties amplify the generality and precision of automatic unpacking and make the output of Minerva highly usable. We extensively evaluate Minerva against synthetic and real-world malware samples and show that our techniques significantly improve on several aspects compared to previous work.
Taint-tracking is emerging as a general technique in software security to complement virtualization and static analysis. It has been applied for accurate detection of a wide range of attacks on benign software, as well as in malware defense. Although it is quite robust for tackling the former problem, application of taint analysis to untrusted (and potentially malicious) software is riddled with several difficulties that lead to gaping holes in defense. These holes arise not only due to the limitations of information flow analysis techniques, but also the nature of today's software architectures and distribution models. This paper highlights these problems using an array of simple but powerful evasion techniques that can easily defeat taint-tracking defenses. Given today's binary-based software distribution and deployment models, our results suggest that information flow techniques will be of limited use against future malware that has been designed with the intent of evading these defenses.
Defending against malware involves analysing large amounts of suspicious samples. To deal with such quantities we rely heavily on automatic approaches to determine whether a sample is malicious or not. Unfortunately, complete and precise automatic analysis of malware is far from an easy task. This is because malware is often designed to contain several techniques and countermeasures specifically to hinder analysis. One of these techniques is for the malware to propagate through the operating system so as to execute in the context of benign processes. The malware does this by writing memory to a given process and then proceeds to have this memory execute. In some cases these propagations are trivial to capture because they rely on well-known techniques. However, in the cases where malware deploys novel code injection techniques, rely on code-reuse attacks and potentially deploy dynamically generated code, the problem of capturing a complete and precise view of the malware execution is non-trivial.
In this paper we present a unified approach to tracing malware propagations inside the host in the context of code injections and code-reuse attacks. We also present, to the knowledge of the authors, the first approach to identifying dynamically generated code based on information-flow analysis. We implement our techniques in a system called Tartarus and match Tartarus with both synthetic applications and real-world malware. We compare Tartarus to previous works and show that our techniques substantially improve the precision for collecting malware execution traces, and that our approach can capture intrinsic characteristics of novel code injection techniques.
Understanding how application programming interfaces (APIs) are used in a program plays an important role in malware analysis. This, however, has resulted in an endless battle between malware authors and malware analysts around the development of API [de]obfuscation techniques over the last few decades. Our goal in this paper is to show a limit of existing API de-obfuscations. To do that, we first analyze existing API [de]obfuscation techniques and clarify an attack vector commonly existed in API de-obfuscation techniques, and then we present Stealth Loader, which is a program loader using our API obfuscation technique to bypass all existing API de-obfuscations. The core idea of this technique is to load a dynamic link library (DLL) and resolve its dependency without leaving any traces on memory to be detected. We demonstrate the effectiveness of Stealth Loader by analyzing a set of Windows executables and malware protected with Stealth Loader using major dynamic and static analysis tools and techniques. The result shows that among other obfuscation techniques, only Stealth Loader is able to successfully bypass all analysis tools and techniques.
Reverse engineering packed binaries remain a tedious challenge as code packing is continuously being used by malware to hinder detection and analysis. The problem of automatically unpacking binaries has previously been investigated. However, current generic unpackers either do not offer any dump of the unpacked binary at all or produces a set of memory dumps that each lack several structures that make them well-suited for further analysis. In this paper, we present RePEconstruct, a tool that unpacks packed binaries and reconstructs them in a manner well suited for further analysis. RePEconstruct deploys a model of self-modifying code similar to previous work but goes the step further by also utilizing a novel, aggressive, approach to rebuilding the import address table. Our approach relies on both static and dynamic analysis. We build RePEconstruct as a DynamoRIO client and successfully evaluate it against a set of packed applications.
We present two techniques to obfuscate the interfaces between application binaries and Windows system DLLs (dynamic-link libraries). The first technique obfuscates the related symbol information in the binary to prevent static analyses from identifying the invoked library functions. The second technique combines static linking with code obfuscation to avoid the external interface altogether, thus preventing dynamic attacks as well. This is done while still maintaining compatibility with multiple Windows versions, through run-time adaptation of the application. As the first concrete result of this ongoing research, we demonstrate and evaluate the techniques using a proof-of-concept tool applied to a simple test program.
API (Application Programming Interface) monitoring is an effective approach for quickly understanding the behavior of malware. It has been widely used in many malware countermeasures as their base. However, malware authors are now aware of the situation and they develop malware using several anti-analysis techniques to evade API monitoring. In this paper, we present our design and implementation of an API monitoring system, API Chaser, which is resistant to evasion-type anti-analysis techniques, e.g. stolen code and code injection. We have evaluated API Chaser with several real-world malware and the results showed that API Chaser is able to correctly capture API calls invoked from malware without being evaded.
As modern operating systems and software become larger and more complex, they are more likely to contain bugs, which may allow attackers to gain illegitimate access. A fast and reliable mechanism to discern and generate vaccines for such attacks is vital for the successful protection of networks and systems. In this paper we present Argos, a containment environment for worms as well as human orchestrated attacks. Argos is built upon a fast x86 emulator which tracks network data throughout execution to identify their invalid use as jump targets, function addresses, instructions, etc. Furthermore, system call policies disallow the use of network data as arguments to certain calls. When an attack is detected, we perform 'intelligent' process- or kernel-aware logging of the corresponding emulator state for further offline processing. In addition, our own forensics shellcode is injected, replacing the malevolent shellcode, to gather information about the attacked process. By correlating the data logged by the emulator with the data collected from the network, we are able to generate accurate network intrusion detection signatures for the exploits that are immune to payload mutations. The entire process can be automated and has few if any false positives, thus rapid global scale deployment of the signatures is possible.
Today’s smartphone operating systems frequently fail to provide users with visibility into how third-party applications collect and share their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid enables realtime analysis by leveraging Android’s virtualized execution environment. TaintDroid incurs only 32% performance overhead on a CPU-bound microbenchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, in our 2010 study we found 20 applications potentially misused users’ private information; so did a similar fraction of the tested applications in our 2012 study. Monitoring the flow of privacy-sensitive data with TaintDroid provides valuable input for smartphone users and security service firms seeking to identify misbehaving applications.
Unpacking is an art—it is a mental challenge and is one of the most exciting mind games in the reverse engineering field. In some cases, the reverser needs to know the internals of the operating system in order to identify or solve very difficult anti-reversing tricks employed by packers/protectors, patience and cleverness are also major factors in a successful unpack. This challenge involves researchers creating the packers and on the other side, the researchers that are determined to bypass these protections. The main purpose of this paper is to present anti-reversing techniques employed by executable packers/protectors and also discusses techniques and publicly available tools that can be used to bypass or disable this protections. This information will allow researchers, especially, malcode analysts to identify these techniques when utilized by packed malicious code, and then be able decide the next move when these anti-reversing techniques impede successful analysis. As a secondary purpose, the information presented can also be used by researchers that are planning to add some level of protection in their software by slowing down reversers from analyzing their protected code, but of course, nothing will stop a skilled, informed, and determined reverser.