Christian Collberg’s research while affiliated with The University of Arizona and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (84)


Tools and Models for Software Reverse Engineering Research
  • Conference Paper

November 2024

·

6 Reads

Thomas Faingnaert

·

Tab Zhang

·

Willem Van Iseghem

·

[...]

·


Control-Flow Deobfuscation using Trace-Informed Compositional Program Synthesis

October 2024

·

2 Reads

Proceedings of the ACM on Programming Languages

Code deobfuscation, which attempts to simplify code that has been intentionally obfuscated to prevent understanding, is a critical technique for downstream security analysis tasks like malware detection. While there has been significant prior work on code deobfuscation, most techniques either do not handle control flow obfuscations that modify control flow or they target specific classes of control flow obfuscations, making them unsuitable for handling new types of obfuscations or combinations of existing ones. In this paper, we study a new deobfuscation technique that is based on program synthesis and that can handle a broad class of control flow obfuscations. Given an obfuscated program P, our approach aims to synthesize a smallest program that is a control-flow reduction of P and that is semantically equivalent. Since our method does not assume knowledge about the types of obfuscations that have been applied to the original program, the underlying synthesis problem ends up being very challenging. To address this challenge, we propose a novel trace-informed compositional synthesis algorithm that leverages hints present in dynamic traces of the obfuscated program to decompose the synthesis problem into a set of simpler subproblems. In particular, we show how dynamic traces can be useful for inferring a suitable control-flow skeleton of the deobfuscated program and performing independent synthesis of each basic block. We have implemented this approach in a tool called Chisel and evaluate it on 546 benchmarks that have been obfuscated using combinations of six different obfuscation techniques. Our evaluation shows that our approach is effective and that it produces code that is almost identical (modulo variable renaming) to the original (non-obfuscated) program in 86% of cases. Our evaluation also shows that Chisel significantly outperforms existing techniques.


Figure 2: Simulated timeline view that displays session information next to a chart with active windows, screenshots, and annotations.
Figure 3: Simulated animation view that plays back screenshots and displays keystroke input, current task, and top process.
Figure 4: Overall pipeline for reAnalyst.
Figure 5: Sample screenshot collected from an experiment.
Figure 6: Scatter plot example from function matching data.

+11

reAnalyst: Scalable Analysis of Reverse Engineering Activities
  • Preprint
  • File available

June 2024

·

53 Reads

·

1 Citation

This paper introduces reAnalyst, a scalable analysis framework designed to facilitate the study of reverse engineering (RE) practices through the semi-automated annotation of RE activities across various RE tools. By integrating tool-agnostic data collection of screenshots, keystrokes, active processes, and other types of data during RE experiments with semi-automated data analysis and annotation, reAnalyst aims to overcome the limitations of traditional RE studies that rely heavily on manual data collection and subjective analysis. The framework enables more efficient data analysis, allowing researchers to explore the effectiveness of protection techniques and strategies used by reverse engineers more comprehensively and efficiently. Experimental evaluations validate the framework's capability to identify RE activities from a diverse range of screenshots with varied complexities, thereby simplifying the analysis process and supporting more effective research outcomes.

Download


Code Obfuscation: Why is This Still a Thing?

March 2018

·

269 Reads

·

12 Citations

Early developments in code obfuscation were chiefly motivated by the needs of Digital Rights Management (DRM). Other suggested applications included intellectual property protection of software and code diversification to combat the monoculture problem of operating systems. Code obfuscation is typically employed in security scenarios where an adversary is in complete control over a device and the software it contains and can tamper with it at will. We call such situations the Man-At-The-End (MATE) scenario. MATE scenarios are the best of all worlds for attackers and, consequently, the worst of all worlds for defenders: Not only do attackers have physical access to a device and can reverse engineer and tamper with it at their leisure, they often have unbounded resources (time, computational power, etc.) to do so. Defenders, on the other hand, are often severely constrained in the types of protective techniques available to them and the amount of overhead they can tolerate. In other words, there is an asymmetry between the constraints of attackers and defenders. Moreover, • DRM is becoming less prevalent (songs for sale on the Apple iTunes Store are no longer protected by DRM, for example); • there are new cryptographically-based obfuscation techniques that promise provably secure obfuscation; • secure enclaves are making it into commodity hardware, providing a safe haven for security sensitive code; and • recent advances in program analysis and generic de-obfuscation provide algorithms that render current code obfuscation techniques impotent. Thus, one may reasonably ask the question: "Is Code Obfuscation Still a Thing?" Somewhat surprisingly, it appears that the answer is yes. In a recent report, Gartner lists 19 companies active in this space (8 of which were founded since 2010) and there are still (in 2017) many papers published on code obfuscation, code de-obfuscation, anti-tamper protection, reverse engineering, and related technologies. One of the reasons for this resurgence of code obfuscation as a protective technology is that, more and more, we are faced with applications where security-sensitive code needs to run on unsecured endpoints. In this talk we will show MATE attacks that appear in many novel and unlikely scenarios, including smart cars, smart meters, mobile applications such as Snapchat and smartphone games, Internet of Things applications, and ad blockers in web browsers. We will furthermore show novel code obfuscation techniques that increase the workload of attackers and which, at least for a time, purport to restore the symmetry between attackers and defenders.


Code obfuscation against symbolic execution attacks

December 2016

·

4,104 Reads

·

142 Citations

Code obfuscation is widely used by software developers to protect intellectual property, and malware writers to hamper program analysis. However, there seems to be little work on systematic evaluations of effectiveness of obfuscation techniques against automated program analysis. The result is that we have no methodical way of knowing what kinds of automated analyses an obfuscation method can withstand. This paper addresses the problem of characterizing the resilience of code obfuscation transformations against automated symbolic execution attacks, complementing existing works that measure the potency of obfuscation transformations against human-assisted attacks through user studies. We evaluated our approach over 5000 different C programs, which have each been obfuscated using existing implementations of obfuscation transformations. The results show that many existing obfuscation transformations, such as virtualization, stand little chance of withstanding symbolic-execution based deobfuscation. A crucial and perhaps surprising observation we make is that symbolic-execution based deobfuscators can easily deobfuscate transformations that preserve program semantics. On the other hand, we present new obfuscation transformations that change program behavior in subtle yet acceptable ways, and show that they can render symbolic-execution based deobfuscation analysis ineffective in practice.



Pinpointing and Hiding Surprising Fragments in an Obfuscated Program

December 2015

·

35 Reads

·

3 Citations

In this paper, we propose a pinpoint-hide defense method, which aims to improve the stealth of obfuscated code. In the pinpointing process, we scan the obfuscated code in a few small code fragment level and identify all surprising fragments, that is, very unusual fragments which may draw the attention of an attacker to the obfuscated code. In the hiding process, we transform the pinpointed surprising fragments into unsurprising ones while preserving semantics. The obfuscated code transformed by our method consists only by unsurprising code fragments, therefore is more difficult for attackers to be distinguished from unobfuscated code than the original. In the case study, we apply our pinpoint-hide method to some programs transformed by well-known obfuscation techniques. The result shows our method can pinpoint surprising fragments such as dummy code that does not fit in the context of the program, and instructions used in a complicated arithmetic expression. We also confirm that instruction camouflage can make the pinpointed surprising fragments unsurprising ones, and that it runs correctly.


Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model

May 2015

·

53 Reads

·

18 Citations

This paper proposes a method for evaluating the artificiality of protected code by means of an N-gram model. The proposed artificiality metric helps us measure the stealth of the protected code, that is, the degree to which protected code can be distinguished from unprotected code. In a case study, we use the proposed method to evaluate the artificiality of programs that are transformed by well-known obfuscation techniques. The results show that static obfuscating transformations (e.g., control flow flattening) have little effect on artificiality. However, dynamic obfuscating transformations (e.g., code encryption), or a technique that inserts junk code fragments into the program, tend to increase the artificiality, which may have a significant impact on the stealth of the code.


Provenance of exposure: Identifying sources of leaked documents

October 2013

·

27 Reads

·

2 Citations

We design a provenance system for documents on clouds. The system allows writing documents by several collaborating individuals. Provenance allows recovery of information about the sequence of significant events relevant to the documents. Existing provenance systems focus on editing events, such as creation or removal of document parts. In this work, we introduce provenance of exposure events, allowing identification of one, or more, individuals which are possible sources of the exposure to external source of a particular version of documents. Our design provides a practical solution for provenance of documents via not-fully-trusted cloud systems, with support for provenance of both exposure and editing events.


Citations (67)


... Obfuscation, Deobfuscation, and Analysis: 0.3%= 2/572 papers [557] survey [42] presents interactive tool for all three tasks Obfuscation and Deobfuscation: 1.6%= 9/572 papers [72,158] theorize about both kinds of tasks [73,178,445] survey and/or evaluate both kinds of transformations [325,449] present a novel deobfuscation technique to study the prevalence of obfuscations techniques in real-world samples [428] * presents a deobfuscation technique to defeat existing obfuscations and novel obfuscations as countermeasures [581] * presents new obfuscation techniques as well as novel deobfuscation techniques, with the latter outperform the existing state-of-the-art while not succeeding (entirely) on the newly obfuscated samples Obfuscation and Analysis: 3.1%= 18/572 papers [28,36] present the use of abstract interpretation to assess obfuscations [82,129,165,230,277,285,353] survey, present, and evaluate malware detection and software analysis techniques as well as obfuscations as a counter-measure [165,215,563] discuss the use of analyses tools and techniques to evaluate the strength of obfuscations [393] * presents novel obfuscations and new models of attacks thereon [540] * presents novel obfuscations and improvements to existing analyses to counter those obfuscations [405,485,551] present novel detection techniques to study the prevalence of obfuscations in real-world software [482] presents empirical studies of the effort needed to attack software protected with specific obfuscations [526] * evaluates state-of-the-art analyses' on obfuscated code and novel mitigating obfuscations Deobfuscation and Analysis: 3.8%= 22/572 papers [136,151,279,302,309,312,336,390,392,437] present or build on third-party (library) code (similarity) and cryptographic primitive detection algorithms [78,256,530,542] present pre-pass deobfuscation for improving malware detection [108,119,141] present analysis tools that are demonstrated in manual deobfuscation uses cases [185,468] present analyses and transformations that deobfuscate software as a side-effect [284,562] present techniques to deobfuscate code as well as to detect code reuse from libraries and from earlier version [301] presents analyses of which the results are equivalent to deobfuscation but without actually deobfuscating any code, such as a data flow analysis that reconstructs the original data dependencies hidden with data flow obfuscation but no papers targeting such languages were in scope because we explicitly exclude special-purpose obfuscation techniques. Figure 3 shows the results. ...

Reference:

Evaluation Methodologies in Software Protection Research
Probabilistic Obfuscation Through Covert Channels
  • Citing Conference Paper
  • April 2018

... Popular techniques to protect against content duplication are implemented in confidential messaging apps like Snapchat. These techniques include using OS-level APIs to disable screen capturing, recording, and sharing of protected media, in addition to requiring the latest software updates with all security patches [39,40]. To avoid these limitations on the client-side, researchers proposed alternative solutions that assure data deletion and enforce self-expiration, using techniques such as attribute-based encryption and revocation, TEE, threshold secret sharing, and frequently colliding hash tables [41,42,43]. ...

Code Obfuscation: Why is This Still a Thing?
  • Citing Conference Paper
  • March 2018

... Software watermark and fingerprint have been used for a long time with the realization but these techniques have some limitations. Some of the researchers and practitioners of industry are using forward-looking versions of software watermark [1][2][3][4][5][6][7][8][9][10][11][12], fingerprints [13,14], software clone [15,16], and software birthmark [17][18][19][20][21][22][23][24][25][26][27][28][29]. Detection of plagiarism is relevant area to these mentioned software detection methods which are used for source code theft and discovery of similarities among the original and duplicated source codes [30][31][32][33][34][35]. ...

Software watermarking through register allocation: Implementation, analysis, and attacks
  • Citing Article
  • January 2003

... Symbolic execution attacks [33], which involve analyzing software by treating inputs as symbolic variables rather than concrete values, have been used in various studies to measure the resilience of obfuscation techniques. These works assess how well obfuscation can withstand such advanced analysis methods as performed by a given technique. ...

Code obfuscation against symbolic execution attacks

... ISA Encoding Solver and GPU Assembler: CPU ISA encoding reverse engineering work [8,13] broadly exists by using a MIPS, SPARC, Alpha, PowerPC, ARM, or x86 assembler to crack CPU instruction sets and extract bitlevel instruction encoding information. These works output C declarations that can be used by binary tools. ...

Reverse interpretation + mutation analysis = automatic retargeting
  • Citing Article
  • May 1997

ACM SIGPLAN Notices

... Previous approaches to obfuscation detection are mainly based on code structures such as opcode frequencies. Kanzaki et al. [18] proposed an artificiality metric that measures the degree to which protected code can be distinguished from unprotected code. Their results showed that while some types of obfuscations strongly impact code artificiality, such as code encryption, others, for example control-flow modifying obfuscations such as CFG flattening, have a minimal effect. ...

Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model
  • Citing Conference Paper
  • May 2015

... The frequent use of academic protection tools raises a question: is their popularity due to their widespread adoption across the research community, or because their own authors are prolific publishers? A detailed analysis made that Tigress and OLLVM have seen widespread use throughout the community: only 3 out of 36 Tigress papers have a connection with the Tigress authors [233,251,338], and only 1 out of 37 OLLVM papers have a connection with the OLLVM authors [231]. This is in stark contrast to Diablo, the third most popular academic protection tool in this survey: only four papers' authors that experimented with Diablo are not part of, or directly collaborated with, the research group that developed Diablo, compared with 18 papers from within that group. ...

Pinpointing and Hiding Surprising Fragments in an Obfuscated Program
  • Citing Conference Paper
  • December 2015

... Researchers from the University of Arizona [10,11] analyzed data on computer systems research in an attempt to measure and understand reproducibility. Although these efforts didn't generate a conclusive hypothesis, they were instrumental in initiating a process to observe the willingness of computer science researchers to share code and data. ...

Repeatability in Computer Systems Research
  • Citing Article
  • February 2016

Communications of the ACM

... Consequently, software protection has attracted much attention from developers and software companies in terms of software security. To ensure security against malicious software attacks, many tools have been developed, such as data obfuscation, tamper-proofing, code splitting, software watermarking, among others [13]. In this regard, assessing the effectiveness of these protections is crucial before embedding them into real commercial products. ...

Software protection