Conference Paper

Xunpack: Cross-Architecture Unpacking for Linux IoT Malware

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Self modifying code (SMC) are code snippets that modify themselves at runtime. Malware use SMC to hide payloads and achieve persistence. Software-based SMC detection solutions impose performance penalties for real-time monitoring and do not benefit from runtime architectural information (cache invalidation or pipeline flush, for instance). We revisit SMC impact on hardware internals and discuss the implementation of an SMC detector at distinct architectural points. We consider three detection approaches: (i) existing hardware counters; (ii) block invalidation by the cache coherence protocol; (iii) the use of Memory Management Unit (MMU) information to control SMC execution. We compare the identified instrumentation points to highlight their strong and weak points. We also compare them to previous SMC detectors’ implementations.
Conference Paper
Full-text available
An open research problem on malware analysis is how to statically distinguish between packed and non-packed executables. This has an impact on antivirus software and malware analysis systems, which may need to apply different heuristics or to resort to more costly code emulation solutions to deal with the presence of potential packing routines. It can also affect the results of many research studies in which the authors adopt algorithms that are specifically designed for packed or non-packed binaries. Therefore, a wrong answer to the question "is this executable packed?" can make the difference between malware evasion and detection. It has long been known that packing and entropy are strongly correlated, often leading to the wrong assumption that a low entropy score implies that an executable is NOT packed. Exceptions to this rule exist, but they have always been considered as one-off cases, with a negligible impact on any large scale experiment. However, if such an assumption might have been acceptable in the past, our experiments show that this is not the case anymore as an increasing and a remarkable number of packed malware samples implement proper schemes to keep their entropy low. In this paper, we empirically investigate and measure this problem by analyzing a dataset of 50K low-entropy Windows malware samples. Our tests show that despite all samples have a low entropy value, over 30% of them adopt some form of runtime packing. We then extended our analysis beyond the pure entropy, by considering all static features that have been proposed so far to identify packed code. Again, our tests show that even a state of the art machine learning classifier is unable to conclude whether a low-entropy sample is packed or not by relying only on features extracted with static analysis.
Article
Full-text available
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience. The survey has been accepted for publication at ACM Computing Surveys, and this is the authors pre-print copy. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fvc
Conference Paper
Full-text available
A detailed understanding of the behavior of exploits and malicious software is necessary to obtain a comprehensive overview of vulnerabilities in operating systems or client applications, and to develop protection techniques and tools. To this end, a lot of research has been done in the last few years on binary analysis techniques to efficiently and precisely analyze code. Most of the common analysis frameworks are based on software emulators since such tools offer a fine-grained control over the execution of a given program. Naturally, this leads to an arms race where the attackers are constantly searching for new methods to detect such analysis frameworks in order to successfully evade analysis. In this paper, we focus on two aspects. As a first contribution, we introduce several novel mechanisms by which an attacker can delude an emulator. In contrast to existing detection approaches that perform a dedicated test on the environment and combine the test with an explicit conditional branch, our detection mechanisms introduce code sequences that have an implicitly different behavior on a native machine when compared to an emulator. Such differences in behavior are caused by the side-effects of the particular operations and imperfections in the emulation process that cannot be mitigated easily. Motivated by these findings, we introduce a novel approach to generate execution traces. We propose to utilize the processor itself to generate such traces. Mores precisely, we propose to use a hardware feature called branch tracing available on commodity x86 processors in which the log of all branches taken during code execution is generated directly by the processor. Effectively, the logging is thus performed at the lowest level possible. We evaluate the practical viability of this approach.
Article
Full-text available
The authors of malware attempt to frustrate reverse engineering and analysis by creating programs that crash or otherwise behave dif-ferently when executed on an emulated platform than when exe-cuted on real hardware. In order to defeat such techniques and facilitate automatic and semi-automatic dynamic analysis of mal-ware, we propose an automated technique to dynamically modify the execution of a whole-system emulator to fool a malware sam-ple's anti-emulation checks. Our approach uses a scalable trace matching algorithm to locate the point where emulated execution diverges, and then compares the states of the reference system and the emulator to create a dynamic state modification that repairs the difference. We evaluate our technique by building an implementa-tion into an emulator used for in-depth malware analysis. On case studies that include real samples of malware collected in the wild and an attack that has not yet been exploited, our tool automatically ameliorates the malware sample's anti-emulation checks to enable analysis, and its modifications are robust to system changes.
Article
Full-text available
Packing is a very popular technique for obfuscating programs, and malware in particular. In order to successfully detect packed mal-ware, dynamic unpacking techniques have been proposed in litera-ture. Dynamic unpackers execute and monitor a packed program, and try to guess when the original code of the program is avail-able unprotected in memory. The major drawback of dynamic un-packers is the performance overhead they introduce. To reduce the overhead and make it possible to perform dynamic unpacking at end-hosts, researches have proposed real-time unpackers that oper-ate at a coarser granularity, namely OmniUnpack and Justin. In this paper, we present a simple compile-time packing algorithm that maximizes the cost of unpacking and minimizes the amount of program code that can be automatically recovered by real-time coarse grained unpackers. The evaluation shows that the real-time dynamic unpackers are totally ineffective against this algorithm.
Article
Full-text available
As reverse engineering becomes a prevalent technique to an-alyze malware, malware writers leverage various anti-reverse engineering techniques to hide their code. One technique commonly used is code packing as packed executables hin-der code analysis. While this problem has been previously researched, the existing solutions are either unable to handle novel samples, or vulnerable to various evasion techniques. In this paper, we propose a fully dynamic approach that cap-tures an intrinsic nature of hidden code execution that the original code should be present in memory and executed at some point at run-time. Thus, this approach monitors pro-gram execution and memory writes at run-time, determines if the code under execution is newly generated, and then extracts the hidden code of the executable. To demonstrate its effectiveness, we implement a system, Renovo, and eval-uate it with a large number of real-world malware samples. The experiments show that Renovo is accurate compared to previous work, yet practical in terms of performance.
Conference Paper
Full-text available
Malicious software (or malware) has become a growing threat as malware writers have learned that signature- based detectors can be easily evaded by "packing" the malicious payload in layers of compression or encryption. State-of-the-art malware detectors have adopted both static and dynamic techniques to recover the pay- load of packed malware, but unfortunately such techniques are highly ineffective. In this paper we propose a new technique, called OmniUnpack, to monitor the execution of a program in real-time and to detect when the program has removed the various layers of packing. OmniUnpack aids malware detection by directly providing to the detector the unpacked malicious payload. Experimental results demonstrate the effectiveness of our approach. OmniUnpack is able to deal with both known and unknown packing algorithms and introduces a low overhead (at most 11% for packed benign programs).
Article
Proliferation of IoT devices has caused an increase in malware. Much IoT malware includes static linking of library functions, and their symbols such as function names and addresses are stripped hindering function-level analysis. We previously showed that pattern matching could identify all library functions statically linked to IoT malware for Intel 80386 and all toolchains used to build them, but it remained unclear how our method identified toolchains used to build IoT malware for other architectures. In this paper, we extend our previous method to identify toolchains used to build IoT malware not only for Intel 80386 but for other architectures: ARC, ARM 32-bit, MIPS 32-bit, MIPS 64-bit, mk68k, PowerPC, sh4, SPARC, and x86-64. Evaluation of toolchain identification of 3,991 malware samples showed our method identified all toolchains used to build them. Only 14 toolchains had been used to build the samples, and are all available on the Web. We found 91.4% of samples other than Intel 80386 were built with the toolchain described in the installation guide of Mirai, which was similar to that of Intel 80386. To share the results of this study with the antimalware community, we have published a list of the toolchain names of each sample and the names and addresses of the library functions linked to each sample on GitHub.
Conference Paper
The rise in instruction set architecture (ISA) diversity and the growing adoption of virtual machines are driving a need for fast, scalable, full-system, cross-ISA emulation and instrumentation tools. Unfortunately, achieving high performance for these cross-ISA tools is challenging due to dynamic binary translation (DBT) overhead and the complexity of instrumenting full-system emulators. In this paper we improve cross-ISA emulation and instrumentation performance through three novel techniques. First, we increase floating point (FP) emulation performance by observing that most FP operations can be correctly emulated by surrounding the use of the host FP unit with a minimal amount of non-FP code. Second, we introduce the design of a translator with a shared code cache that scales for multi-core guests, even when they generate translated code in parallel at a high rate. Third, we present an ISA-agnostic instrumentation layer that can instrument guest operations that occur outside of the DBT’s intermediate representation (IR), which are common in full-system emulators. We implement our approach in Qelt, a high-performance cross-ISA machine emulator and instrumentation tool based on QEMU. Our results show that Qelt scales to 32 cores when emulating a guest machine used for parallel compilation, which demonstrates scalable code translation. Furthermore, experiments based on SPEC06 show that Qelt (1) outperforms QEMU as a full-system cross-ISA machine emulator by 1.76×/2.18× for integer/FP workloads, (2) outperforms state-of-the-art, cross-ISA, full-system instrumentation tools by 1.5×-3×, and (3) can match the performance of Pin, a state-of-the-art, same-ISA DBI tool, when used for complex instrumentation such as cache simulation.
Conference Paper
Binary packing, encoding binary code prior to execution and decoding them at run time, is the most common obfuscation adopted by malware authors to camouflage malicious code. Especially, most packers recover the original code by going through a set of "written-then-executed" layers, which renders determining the end of the unpacking increasingly difficult. Many generic binary unpacking approaches have been proposed to extract packed binaries without the prior knowledge of packers. However, the high runtime overhead and lack of anti-analysis resistance have severely limited their adoptions. Over the past two decades, packed malware is always a veritable challenge to anti-malware landscape. This paper revisits the long-standing binary unpacking problem from a new angle: packers consistently obfuscate the standard use of API calls. Our in-depth study on an enormous variety of Windows malware packers at present leads to a common property: malware's Import Address Table (IAT), which acts as a lookup table for dynamically linked API calls, is typically erased by packers for further obfuscation; and then unpacking routine, like a custom dynamic loader, will reconstruct IAT before original code resumes execution. During a packed malware execution, if an API is invoked through looking up a rebuilt IAT, it indicates that the original payload has been restored. This insight motivates us to design an efficient unpacking approach, called BinUnpack. Compared to the previous methods that suffer from multiple "written-then-executed" unpacking layers, BinUnpack is free from tedious memory access monitoring, and therefore it introduces very small runtime overhead. To defeat a variety of ever-evolving evasion tricks, we design BinUnpack's API monitor module via a novel kernel-level DLL hijacking technique. We have evaluated BinUnpack's efficacy extensively with more than 238K packed malware and multiple Windows utilities. BinUnpack's success rate is significantly better than that of existing tools with several orders of magnitude performance boost. Our study demonstrates that BinUnpack can be applied to speeding up large-scale malware analysis.
Conference Paper
Defending against malware involves analysing large amounts of suspicious samples. To deal with such quantities we rely heavily on automatic approaches to determine whether a sample is malicious or not. Unfortunately, complete and precise automatic analysis of malware is far from an easy task. This is because malware is often designed to contain several techniques and countermeasures specifically to hinder analysis. One of these techniques is for the malware to propagate through the operating system so as to execute in the context of benign processes. The malware does this by writing memory to a given process and then proceeds to have this memory execute. In some cases these propagations are trivial to capture because they rely on well-known techniques. However, in the cases where malware deploys novel code injection techniques, rely on code-reuse attacks and potentially deploy dynamically generated code, the problem of capturing a complete and precise view of the malware execution is non-trivial. In this paper we present a unified approach to tracing malware propagations inside the host in the context of code injections and code-reuse attacks. We also present, to the knowledge of the authors, the first approach to identifying dynamically generated code based on information-flow analysis. We implement our techniques in a system called Tartarus and match Tartarus with both synthetic applications and real-world malware. We compare Tartarus to previous works and show that our techniques substantially improve the precision for collecting malware execution traces, and that our approach can capture intrinsic characteristics of novel code injection techniques.
Conference Paper
Malware authors constantly develop new techniques in order to evade analysis systems. Previous works addressed attempts to evade analysis by means of anti-sandboxing and anti-virtualization techniques, for example proposing to run samples on bare-metal. However, state-of-the-art bare-metal tools fail to provide richness and completeness in the results of the analysis. In this context, Dynamic Binary Instrumentation (DBI) tools have become popular in the analysis of new malware samples because of the deep control they guarantee over the instrumented binary. As a consequence, malware authors developed new techniques, called anti-instrumentation, aimed at detecting if a sample is being instrumented. We propose a practical approach to make DBI frameworks more stealthy and resilient against anti-instrumentation attacks. We studied the common techniques used by malware to detect the presence of a DBI tool, and we proposed a set of countermeasures to address them. We implemented our approach in Arancino, on top of the Intel Pin framework. Armed with it, we perform the first large-scale measurement of the anti-instrumentation techniques employed by modern malware. Finally, we leveraged our tool to implement a generic unpacker, showing some case studies of the anti-instrumentation techniques used by known packers.
Conference Paper
We analyze the increasing threats against IoT devices. We show that Telnet-based attacks that target IoT devices have rocketed since 2014. Based on this observation, we propose an IoT honeypot and sandbox, which attracts and analyzes Telnet-based attacks against various IoT devices running on different CPU architectures such as ARM, MIPS, and PPC. By analyzing the observation results of our honeypot and captured malware samples, we show that there are currently at least 4 distinct DDoS malware families targeting Telnet-enabled IoT devices and one of the families has quickly evolved to target more devices with as many as 9 different CPU architectures.
Conference Paper
We present PANDA, an open-source tool that has been purpose-built to support whole system reverse engineering. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., is represented by only a few hundred MB. PANDA leverages QEMU's support of thirteen different CPU architectures to make analyses of those diverse instruction sets possible within the LLVM IR. In this way, PANDA can have a single dynamic taint analysis, for example, that precisely supports many CPUs. PANDA analyses are written in a simple plugin architecture which includes a mechanism to share functionality between plugins, increasing analysis code re-use and simplifying complex analysis development. We demonstrate PANDA's effectiveness via a number of use cases, including enabling an old but legitimately purchased game to run despite a lost CD key, in-depth diagnosis of an Internet Explorer crash, and uncovering the censorship activities and mechanisms of an IM client.
Conference Paper
Fighting malware involves analyzing large numbers of suspicious binary files. In this context, disassembly is a crucial task in malware analysis and reverse engineering. It involves the recovery of assembly instructions from binary machine code. Correct disassembly of binaries is necessary to produce a higher level representation of the code and thus allow the analysis to develop high-level understanding of its behavior and purpose. Nonetheless, it can be problematic in the case of malicious code, as malware writers often employ techniques to thwart correct disassembly by standard tools. In this paper, we focus on the disassembly of x86 self-modifying binaries with overlapping instructions. Current state-of-the-art disassemblers fail to interpret these two common forms of obfuscation, causing an incorrect disassembly of large parts of the input. We introduce a novel disassembly method, called concatic disassembly, that combines CONCrete path execution with stATIC disassembly. We have developed a standalone disassembler called CoDisasm that implements this approach. Our approach substantially improves the success of disassembly when confronted with both self-modification and code overlap in analyzed binaries. To our knowledge, no other disassembler thwarts both of these obfuscations methods together.
Conference Paper
API (Application Programming Interface) monitoring is an effective approach for quickly understanding the behavior of malware. It has been widely used in many malware countermeasures as their base. However, malware authors are now aware of the situation and they develop malware using several anti-analysis techniques to evade API monitoring. In this paper, we present our design and implementation of an API monitoring system, API Chaser, which is resistant to evasion-type anti-analysis techniques, e.g. stolen code and code injection. We have evaluated API Chaser with several real-world malware and the results showed that API Chaser is able to correctly capture API calls invoked from malware without being evaded.
Article
Program obfuscation is an important software protection technique that prevents attackers from revealing the programming logic and design of the software. We introduce translingual obfuscation, a new software obfuscation scheme which makes programs obscure by "misusing" the unique features of certain programming languages. Translingual obfuscation translates part of a program from its original language to another language which has a different programming paradigm and execution model, thus increasing program complexity and impeding reverse engineering. In this paper, we investigate the feasibility and effectiveness of translingual obfuscation with Prolog, a logic programming language. We implement translingual obfuscation in a tool called BABEL, which can selectively translate C functions into Prolog predicates. By leveraging two important features of the Prolog language, i.e., unification and backtracking, BABEL obfuscates both the data layout and control flow of C programs, making them much more difficult to reverse engineer. Our experiments show that BABEL provides effective and stealthy software obfuscation, while the cost is only modest compared to one of the most popular commercial obfuscators on the market. With BABEL, we verified the feasibility of translingual obfuscation, which we consider to be a promising new direction for software obfuscation.
Article
Run-time packers are often used by malware-writers to obfuscate their code and hinder static analysis. The packer problem has been widely studied, and several solutions have been proposed in order to generically unpack protected binaries. Nevertheless, these solutions commonly rely on a number of assumptions that may not necessarily reflect the reality of the packers used in the wild. Moreover, previous solutions fail to provide useful information about the structure of the packer or its complexity. In this paper, we describe a framework for packer analysis and we propose a taxonomy to measure the runtime complexity of packers. We evaluated our dynamic analysis system on two datasets, composed of both off-the-shelf packers and custom packed binaries. Based on the results of our experiments, we present several statistics about the packers complexity and their evolution over time.
Conference Paper
CPU emulator is a program emulating the internal operation of a physical CPU in software. CPU emulator plays a vital role and has a lot of applications in computer security area, such as reversing obfuscated malware or verifying code semantics. This paper presents \textit{Unicorn} emulator framework, which offers some unparalleled features: architecture-independent API, multi-architectures, multi-platforms, thread-safe and open source. We will introduce some existing emulators, then go into details of their design/implementation and explains their current issues. Next, the architecture of Unicorn will be discussed with focus on the challenges of designing and implementing it. Unicorn aims to lay the ground for innovative works. To conclude this paper, some new advanced tools built on top of our engine will be introduced to demonstrate its power, so the readers can see how Unicorn can open up many opportunities for future of security research and development. Find more information and the full source code of Unicorn at its homepage http://www.unicorn-engine.org.
Article
Dynamic binary analysis is a prevalent and indispensable technique in program analysis. While several dynamic binary analysis tools and frameworks have been proposed, all suffer from one or more of: prohibitive performance degradation, semantic gap between the analysis code and the program being analyzed, architecture/OS specificity, being user-mode only, lacking APIs, etc. We present DECAF, a virtual machine based, multi-target, whole-system dynamic binary analysis framework built on top of QEMU. DECAF provides Just-In-Time Virtual Machine Introspection combined with a novel TCG instruction-level tainting at bit granularity, backed by a plugin based, simple-to-use event driven programming interface. DECAF exercises fine control over the TCG instructions to accomplish on-the-fly optimizations. We present 3 platform-neutral plugins - Instruction Tracer, Keylogger Detector, and API Tracer, to demonstrate the ease of use and effectiveness of DECAF in writing cross-platform and system-wide analysis tools. Implementation of DECAF consists of 9550 lines of C++ code and 10270 lines of C code and we evaluate DECAF using CPU2006 SPEC benchmarks and show average overhead of 605% for system wide tainting and 12% for VMI.
Conference Paper
Dynamic information flow tracking is a well-known dynamic software analysis technique with a wide variety of applications that range from making systems more secure, to helping developers and analysts better understand the code that systems are executing. Traditionally, the fine-grained analysis capabilities that are desired for the class of these systems which operate at the binary level require tight coupling to a specific ISA. This places a heavy burden on developers of these systems since significant domain knowledge is required to support each ISA, and the ability to amortize the effort expended on one ISA implementation cannot be leveraged to support other ISAs. Further, the correctness of the system must carefully evaluated for each new ISA. In this paper, we present a general approach to information flow tracking that allows us to support multiple ISAs without mastering the intricate details of each ISA we support, and without extensive verification. Our approach leverages binary translation to an intermediate representation where we have developed detailed, architecture-neutral information flow models. To support advanced instructions that are typically implemented in C code in binary translators, we also present a combined static/dynamic analysis that allows us to accurately and automatically support these instructions. We demonstrate the utility of our system in three different application settings: enforcing information flow policies, classifying algorithms by information flow properties, and characterizing types of programs which may exhibit excessive information flow in an information flow tracking system.
Article
Anti-virus vendors are confronted with a multitude of potentially malicious samples today. Receiving thousands of new samples every day is not uncommon. The signatures that detect confirmed malicious threats are mainly still created manually, so it is important to discriminate between samples that pose a new unknown threat and those that are mere variants of known malware. This survey article provides an overview of techniques based on dynamic analysis that are used to analyze potentially malicious samples. It also covers analysis programs that leverage these It also covers analysis programs that employ these techniques to assist human analysts in assessing, in a timely and appropriate manner, whether a given sample deserves closer manual inspection due to its unknown malicious behavior.
Article
Software armoring techniques have increasingly created problems for reverse engineers and software security analyst s. As protections such as packers, run-time obfuscators, virtual machine and debugger detectors become common, newer methods must be developed to cope with them. In this paper we will present our covert debugging platform named Saffron. Saffron i s based upon dynamic instrumentation techniques as well as a newly developed page fault assisted debugger. We show that the com bination of these two techniques is effective in removing armor ing from most software armoring systems.
Article
An abstract is not available.
Conference Paper
Modern malware often hide the malicious portion of their program code by making it appear as data at compile-time and transforming it back into executable code at runtime. This obfuscation technique poses obstacles to researchers who want to understand the malicious behavior of new or unknown malware and to practitioners who want to create models of detection and methods of recovery. In this paper we propose a technique for automating the process of extracting the hidden-code bodies of this class of malware. Our approach is based on the observation that sequences of packed or hidden code in a malware instance can be made self-identifying when its runtime execution is checked against its static code model. In deriving our technique, we formally define the unpack-executing behavior that such malware exhibits and devise an algorithm for identifying and extracting its hidden-code. We also provide details of the implementation and evaluation of our extraction technique; the results from our experiments on several thousand malware binaries show our approach can be used to significantly reduce the time required to analyze such malware, and to improve the performance of malware detection tools.
Conference Paper
We propose two novel techniques for reducing the workload for malware analysis. The first technique is restricted instruction, which accelerates finding the longest common subsequence (LCS) between machine code instruction sequences of malware. The second technique is probabilistic disassembly, which can find the most probable disassembly result of a binary stream without a clue, such as debug symbols or the information of import functions. By combining the two proposals and our generic unpacker, we built an automatic malware classification system. Given an unknown malware program, the system enables malware analysts to find the most similar known malware program to this unknown one, and even estimate different/common instructions. In one of our experiments, we classified 3,233 malware samples in the wild and concluded that 75% of the samples belong to the seven largest clusters. As a result, only seven samples, one from each cluster, were required to be analyzed in order to reveal the functionality of the rest of the 75%, showing a great increase in efficiency of analysis.
Conference Paper
We introduce Eureka, a framework for enabling static analysis on Internet malware binaries. Eureka incorporates a novel binary unpacking strategy based on statistical bigram analysis and coarse-grained execution tracing. The Eureka framework uniquely distinguishes itself from prior work by providing effective evaluation metrics and techniques to assess the quality of the produced unpacked code. Eureka provides several Windows API resolution techniques that identify system calls in the unpacked code by overcoming various existing control flow obfuscations. Eureka's unpacking and API resolution capabilities facilitate the structural analysis of the underlying malware logic by means of micro-ontology generation that labels groupings of identified API calls based on their functionality. They enable a visual means for understanding malware code through the automated construction of annotated control flow and call graphs.Our evaluation on multiple datasets reveals that Eureka can simplify analysis on a large fraction of contemporary Internet malware by successfully unpacking and deobfuscating API references.
Conference Paper
Malware has become the centerpiece of most security threats on the Internet. Malware analysis is an essential technology that extracts the runtime behavior of malware, and supplies signatures to detection systems and provides evidence for re- covery and cleanup. The focal point in the malware analysis battle is how to detect versus how to hide a malware ana- lyzer from malware during runtime. State-of-the-art analyz- ers reside in or emulate part of the guest operating system and its underlying hardware, making them easy to detect and evade. In this paper, we propose a transparent and ex- ternal approach to malware analysis, which is motivated by the intuition that for a malware analyzer to be transparent, it must not induce any side-effects that are unconditionally detectable by malware. Our analyzer, Ether, is based on a novel application of hardware virtualization extensions such as Intel VT, and resides completely outside of the target OS environment. Thus, there are no in-guest software compo- nents vulnerable to detection, and there are no shortcomings that arise from incomplete or inaccurate system emulation. Our experiments are based on our study of obfuscation tech- niques used to create 25,000 recent malware samples. The results show that Ether remains transparent and defeats the obfuscation tools that evade existing approaches.
Conference Paper
An increasing percentage of malware programs distributed in the wild are packed by packers, which are programs that transform an input binary's appearance without affecting its execution semantics, to create new malware variants that can evade signature-based malware detection tools. This paper reports the results of a comprehensive study of the extent of the packer problem based on data collected at Symantec and the effectiveness of existing solutions to this problem. Then the paper presents a generic unpacking solution called Justin (Just-In-Time AV scanning), which is designed to detect the end of unpacking of a packed binary's run and invoke AV scanning against the process image at that time. For accurate end-to-unpacking detection, Justin incorporates the following heuristics: Dirty Page Execution, Unpacker Memory Avoidance, Stack Pointer Check and Command-Line Argument Access. Empirical testing shows that when compared with SymPack, which contains a set of manually created unpackers for a collection of selective packers, Justin's effectiveness is comparable to SymPack for those binaries packed by these supported packers, and is much better than SymPack for binaries packed by those that SymPack does not support.
Article
Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100% binary compatibility at the expense of performance. Others sacrifice security or functionality for speed. Few offer resource isolation or performance guarantees; most provide only best-effort provisioning, risking denial of service. This paper presents Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality. This is achieved by providing an idealized virtual machine abstraction to which operating systems such as Linux, BSD and Windows XP, can be ported with minimal effort. Our design is targeted at hosting up to 100 virtual machine instances simultaneously on a modern server. The virtualization approach taken by Xen is extremely efficient: we allow operating systems such as Linux and Windows XP to be hosted simultaneously for a negligible performance overhead - at most a few percent compared with the unvirtualized case. We considerably outperform competing commercial and freely available solutions in a range of microbenchmarks and system-wide tests.
CoDisasm: Medium Scale Concatic Disassembly of Self-Modifying Binaries with Overlapping Instructions
  • Guillaume Bonfante
  • Jose Fernandez
  • Jean-Yves Marion
  • Benjamin Rouxel
  • Fabrice Sabatier
  • Aurélien Thierry
  • Bonfante Guillaume
San Francisco. Emanuele Cozzi, Mariano Graziano, Yanick Fratantonio, and Davide Balzarotti
  • Emanuele Cozzi
  • Mariano Graziano
  • Yanick Fratantonio
  • Davide Balzarotti
  • Cozzi Emanuele
Fabian Monrose, and Manos Antonakakis. 2021. The Circle Of Life: A Large-Scale Study of The IoT Malware Lifecycle
  • Omar Alrawi
  • Charles Lever
  • Kevin Valakuzhy
  • Ryan Court
  • Kevin Snow
  • Fabian Monrose
  • Manos Antonakakis
  • Alrawi Omar
Advanced in ELF Runtime Binary Encryption - Shiva
  • Shaun Clowes
  • Neel Mehta
State-of-the-art binary code analysis tools
  • Hex-Rays
Adaptive Observation of Emerging Cyber Attacks targeting Various IoT Devices
  • Seiya Kato
  • Rui Tanabe
  • Katsunari Yoshioka
  • Tsutomu Matsumoto
  • Kato Seiya
Alessandro Mantovani, Simone Aonzo, Xabier-Ugarte Pedrero, Alessio Merlo, and Davide Balzarotti. 2020. Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem
  • Alessandro Mantovani
  • Simone Aonzo
  • Xabier-Ugarte
  • Alessio Pedrero
  • Davide Merlo
  • Balzarotti
  • Mantovani Alessandro
Taint-assisted IAT Reconstruction against Position Obfuscation
  • Kawakoya Yuhei
  • Iwamura Makoto
  • Yuhei Kawakoya
Unboxing Linux/Mumblehard
  • Etienne M Marc
  • Léveillé
X-Force: Force-Executing Binary Programs for Security Applications
  • Fei Peng
  • Zhui Deng
  • Xiangyu Zhang
  • Dongyan Xu
  • Zhiqiang Lin
  • Zhendong Su
  • Peng Fei
Unpacking with OllyBonE
  • Joe Stewart
cryptexec: runtime binary encryption using on-demand function extraction
  • Zeljko Vrba
UPX Anti-Unpacking Techniques in IoT Malware
  • Albert Zsigovits
Kevin Snow, Fabian Monrose, and Manos Antonakakis. 2021. The Circle Of Life: A Large-Scale Study of The IoT Malware Lifecycle
  • Omar Alrawi
  • Charles Lever
  • Kevin Valakuzhy
  • Ryan Court
  • Alrawi Omar