Article

A Method for Watermarking Java Programs via Opaque Predicates

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In this paper, we present a method for watermarking Java programs that uses opaque predicates, improving upon those presented in two previous papers (13, 9). We present two algorithms: the first is simpler to implement and to analyze, but certain distortive attacks can make watermark extracti on difficult; the second is more complex, but under realistic assumptions yields good resistance to all usual types of att acks.

No full-text available

Request Full-text Paper PDF

Request the article directly
from the author on ResearchGate.

... 5.3). The stegoprogram is marked with s by inserting the stegomark initialization part: (1) in Z and the stegomark iteration part: ...
... Examples of subtractive attacks include static dead code elimination in case the stegosignatures are supposed to be hidden in dead code (see e.g. [14]) or dynamic observation of the dead code in case the dead code is protected by an opaque predicate (opaque means that the outcome of the predicate is known at watermarking time but the predicate is difficult for an adversary to resolve i.e. to find the truth value solution of [1,6,14]). Distortive attacks apply transformations to the object so that the stegosignatures can no longer be extracted. Obfuscation and optimizing compilation to generate machine code are such distortive attacks. ...
... For example introducing dead and irrelevant code or converting a reducible to an irreducible flow graph does not change the abstract interpretation of the useful code. The same way, opaque test and loop predicates [1,6,14] is no problem if the signature extraction does not depend upon predicates. However the embedding might include such opaque predicates for obfuscation purposes. ...
Article
Full-text available
Software watermarking consists in the intentional embedding of in- delible stegosignatures or watermarks into the subject software and extraction of the stegosignatures embedded in the stegoprograms for purposes such as intellectual property protection. We introduce the novel concept of abstract software watermarking. The basic idea is that the watermark is hidden in the program code in such a way that it can only be extracted by an abstract interpretation of the (maybe non-standard) concrete semantics of this code. This static analysis-based approach allows the watermark to be recovered even if only a small part of the program code is present and does not even need that code to be executed. We illustrate the technique by a simple abstract watermarking protocol for methods of JavaTM classes. The concept applies equally well to any other kind of software (including hardware originally specified by software).
... Opaque predicates were first presented in paper [8] as a technique to aid in code obfuscation. Later opaque predicates were incorporated in Java programs watermarking technique proposed in paper [10] . Informally, the inserted opaque predicates make it difficult for an adversary to analyze the control-flow of the programs. ...
... This makes it more difficult to identify what certain portions of the programs are superfluous, but the opaque predicate library must remain secret, if an adversary knows even a few of the predicates he may be able to identify and remove them from the programs. Regrettably, the method [10] hasn't considered cryptographically secure, so it shouldn't stand against copy attacks and ambiguity attacks [5]. In this paper, we propose an identities (ID i ) of participants based watermarking scheme. ...
... resist the attacks, the number of opaque predicates O j must be large enough. One way of making O j larger is using parametrized predicates(see paper [10]). This paper utilizes Legendre symbol [11] to construct a family of watermarked opaque predicates {O j } j=n j=1 . ...
Conference Paper
Full-text available
In this paper, we propose an identities(ID) based watermarking scheme for Java programs. In our scheme, the watermark is generated by participants’ identities, embedded via the watermarked opaque predicates, and verified using zero-knowledge proof. We also present a construction of a family of opaque predicates by Legendre symbol, which is resilient, cheap, and stealthy. The order of the watermark is encoded and embedded into the watermarked opaque predicates, and the watermarked opaque predicates are treated as threads of a Java program. Thus, the embedded watermark is dynamic and secure against all usual types of watermarks algorithms attacks and watermarks protocols attacks, and also secure against static and dynamic attacks. KeywordsJava programs-watermarked opaque predicates- ID -aggregate signature-watermarking scheme
... This approach easily avoids methods from being eliminated by the dead code. Later, Arboit (2002) proposed a watermarking method for java programs using an opaque predicate. Finally, Alitavoli et al. (2013) proposed a novel method for watermarking Java programs that are resilient to decompile-recompile and obfuscation based attacks. ...
... Consequently, Myles and Collberg (2006) evaluated two watermarking methods in SANDMARK (Collberg et al. 2003b). They conclude that the Arboit algorithm (Arboit 2002) is stronger than the Collberg's algorithm (Collberg et al. 2003b), based on some properties like resiliency and stealth. Besides, the authors explored if dynamic methods are inherently more resilient to attacks than static algorithms. ...
Article
Full-text available
In this paper, we present a brief survey on software watermarking methods that explains the prospects and constraints of most software watermarking algorithms. We introduce a detailed classification of the existing software watermarking techniques, the associated attack models, along with a review of the current software watermarking tools. The reader can find a clear research path on software watermarking from 1996 to till date. We also critically analyzed the strength and weakness of the existing watermarking schemes and explored the research gaps where the future researchers can concentrate.
... Apart from their types, opaque predicates are defined by their constructions. Several proposals exist in the literature about how to construct a resilient and stealthy opaque predicate [9,47,133,199,200], as presented in [55]. Each of these constructions have for purposes to thwart specific deobfuscation analyses, e.g. ...
Thesis
In this thesis we have studied different deobfuscation approaches toward a static evaluation of obfuscation transformations.We mainly focused on static semantic reasoning, combining it with well known techniques from other research areas such as binary diffing and machine learning.We studied and developed several deobfuscation frameworks, one for each of the followings approaches; simplifying the obfuscated code, removing the obfuscation transformations or gathering informations about the protections applied.This thesis contributions are made to answer the following questions:1. How can we contribute to existing generic deobfuscation methodologies?2. How can we use machine learning techniques for the purpose of removing widely used obfuscation transformations?3. How can we help reverse-engineers select the adequate deobfuscation analyses?In order to answer the first question, we transposed semantic-based binary diffing techniques for the purpose of statically simplifying obfuscated binaries.We developed our methodology, called DoSE, as an IDA Pro plug-in and for three major applications.We evaluated each applications against real-world and well-known malwares such as Cryptowall and Flame.Our evaluations underlines the efficiency of DoSE for each applications, with up to 63% of control-flow graph reduction or 1954 cloned functions detected on Flame.We demonstrated that DoSE can be efficiently extended to the detection of two-way opaque predicates, which until then were not detected by any known technique.Therefore, this contribution paves the way for combining semantic equivalence methodologies with existing generic deobfuscation techniques, in order to improve their efficiency and scalability.Our second contribution is, to the best of our knowledge, the first deobfuscation technique based on machine learning for the purpose of removing obfuscation transformations.By introducing the different constructions of opaque predicates and the limitations from dynamic symbolic execution techniques and SMT solvers,we underlined the importance of studying other alternatives for generic evaluations of these transformations.We developed an IDA Pro plug-in, a new approach that bridges a thresholded static symbolic execution with machine learning classification to evaluate both the stealth and resilience of invariant opaque predicates constructions.The use of static symbolic execution allows us to have a better code coverage and scalability, which combined with a machine learning model, permits a generic approach by discarding the use of SMT solvers.Our studies illustrate that our choices conduct towards the implementation of an efficient and accurate evaluation framework against state of the art obfuscators.We created two models for the evaluation of stealth and resiliency of state-of-the-art opaque predicates constructions, with results up to 99% for detection and 95% for deobfuscation.Moreover, we extended our work to a deobfuscation plug-in and compared our results to other tools, showing the efficiency of machine learning for the deobfuscation of most invariant opaque predicates constructions.The third contribution of this thesis aims at improving existing metadata recovery attack. Existing technique syntax-oriented and thus suffers from several limitations.We presented the efficiency of semantic reasoning combined with advanced machine learning techniques.This combination is motivated by the construction of a fine-grained detection framework of obfuscation transformations and constructions.By extending our approach to multi-label and multi-output classification, we enhanced metadata recovery attacks to the detection of multiple layers of obfuscation transformations, as well as their constructions.Our results are promising, with overall accuracies up to 91% for the transformations and 100% for the constructions.
... A lot of research has been done on software watermarking. The major software watermarking algorithms currently available are based on several techniques, among which the register allocation, spread-spectrum, opaque predicate, threading, dynamic path techniques (see, [1], [8], [9], [15], [17]). ...
Conference Paper
Full-text available
In this paper we propose an efficient and easily implemented codec system for encoding watermark numbers as reducible permutation flow-graphs. More precisely, in light of our recent encoding algorithms which encode a watermark value w as a self-inverting permutation π*, we present an efficient algorithm which encodes a self-inverting permutation π* as a reducible permutation flow-graph F[π*] by exploiting domination relations on the elements of π* and using an efficient DAG representation of π*. The whole encoding process takes O(n) time and space, where n is the binary size of the number w or, equivalently, the number of elements of the permutation π*. We also propose efficient decoding algorithms which extract the permutation π* from the reducible permutation flow-graph F[π*] within the same time and space complexity. The two main components of our proposed codec system, i.e., the self-inverting permutation π* and the reducible permutation graph F[π*], incorporate important structural properties which make our codec system resilient to attacks.
... Arboit et. al proposed two methods for watermarking Java programs that use opaque predicates [22]. Later, this method was assessed by Myles and Collberg who implemented both static and dynamic versions within the Sand-Mark framework [23]. ...
Article
Full-text available
Cyber-attacks are evolving at a disturbing rate. Data breaches, ransomware attacks, cryptojacking, malware and phishing attacks are now rampant. In this era of cyber warfare, the software industry is also growing with an increasing number of software being used in all domains of life. This evolution has added to the problems of software vendors and users where they have to prevent a wide range of attacks. Existingwatermarkdetectionsolutionshavealowdetectionrateinthesoftware.Inordertoaddressthisissue, thispaperproposesanovelblindZerocodebasedWatermarkdetectionapproachnamedKeySplitWatermark, for the protection of software against cyber-attacks. The algorithm adds watermark logically into the code utilizingtheinherentpropertiesofcodeandgivesarobustsolution.Theembeddingalgorithmuseskeywords to make segments of the code to produce a key-dependent on the watermark. The extraction algorithms use this key to remove watermark and detect tampering. When tampering increases to a certain user-defined threshold, the original software code is restored making it resilient against attacks. KeySplitWatermark is evaluated on tampering attacks on three unique samples with two distinct watermarks. The outcomes show that the proposed approach reports promising results against cyber-attacks that are powerful and viable. We compared the performance of our proposal with state-of-the-art works using two different software codes. Our results depict that KeySplitWatermark correctly detects watermarks, resulting in up to 15.95 and 17.43 percent reduction in execution time on given code samples with no increase in program size and independent of watermark size.
... In the early 1990's, such ancient idea has been leveraged to the context of software protection as a means to preclude-or at least discourage-the widespread crime of software piracy. A lot of research has been done on software watermarking ever since, and several distinct techniques have been used, including opaque predicates, register allocation, abstract interpretation and dynamic paths [Arboit 2002, Nagra and Thomborson 2004, Cousot and Cousot 2004, Collberg et al. 2004]. ...
Conference Paper
Full-text available
Embedding watermarks into proprietary objects is an old means of discouraging piracy. It works by embedding into the object some (often surrep-titious) data whose retrieval discloses authorship/ownership. Several graph-based watermarking schemes to protect the intellectual property of software have been suggested, and considerable efforts have lately been endeavored to improve their resilience to attacks. Among the pursued attributes of this solution is a high level of stealthiness, i.e., the ability of the watermark graph to be dis-guised into the actual software binary. We propose a randomized such scheme, improving upon the stealthiness of deterministic approaches, while its encod-ing/decoding procedures can be implemented to run in linear time nonetheless.
... This is an example of a P T p opaque predicate. This program employs the algebraic identity (x + y) 2 = x 2 + 2xy + y 2 to form a number-theoretic opaque predicate which always evaluates true [15]. As a result, 26 lines of code have disguised a single unconditional jump. ...
Conference Paper
Full-text available
Reverse Code Engineering (RCE) to detect anti-debugging techniques in software is a very difficult task. Code obfuscation is an anti-debugging technique makes detection even more challenging. The Rule Engine Detection by Intermediate Representation (REDIR) system for automated static detection of obfuscated anti-debugging techniques is a prototype designed to help the RCE analyst improve performance through this tedious task. Three tenets form the REDIR foundation. First, Intermediate Representation (IR) improves the analyzability of binary programs by reducing a large instruction set down to a handful of semantically equivalent statements. Next, an Expert System (ES) rule-engine searches the IR and initiates a sense-making process for anti-debugging technique detection. Finally, an IR analysis process confirms the presence of an anti-debug technique. The REDIR system is implemented as a debugger plug-in. Within the debugger, REDIR interacts with a program in the disassembly view. Debugger users can instantly highlight anti-debugging techniques and determine if the presence of a debugger will cause a program to take a conditional jump or fall through to the next instruction.
... SDSW is compared with five different algorithms implemented in Sandmark including Monden [10], ROW [11], and Static Arboit [12] on above mentioned parameters. These algorithms are applied to three different program comprises of 1000, 2000 and 3000 logical line of code. ...
Conference Paper
Full-text available
Software piracy is a direct threat to the revenue of software vendors, requiring the need to employ effective and efficient techniques for detecting and preventing software piracy. One of the most promising attempts to protect intellectual property rights includes software watermarking. In this paper, we present a new technique for software watermarking which we call Semblance Based Disseminated Software Watermarking Algorithm (SDSW). Our technique embeds imitative instructions in object code using a defined 'string to instruction' mapping. These statements do not affect the overall execution semantics of program, however are hard to identify and extract for having resemblance with actual program code. This characteristic leads to the identification of the legitimate buyer, responsible for distribution of pirated copies to penalize legally. The effectiveness of proposed technique has been evaluated against published techniques implemented in Sandmark a watermarking application.
... Another well-known technique for obfuscation [CTL98] (and also for watermarking [Arb02]) is the use of opaque predicates in programs. An opaque predicate has a property that is known at obfuscation time, but is hard to know afterwards. ...
Article
Full-text available
Measuring the security of code obfuscation is difficult. A novel obfuscation transformation is in some cases only measured in terms of code expansion and speed, which are in fact only side effects of the transformation. A first step to define a security value to an obfuscation transforma- tion could be having a look at what a cracker is able to reveal from an obfuscated program. This abstract first of all gives a short overview of existing techniques to obfuscate. Then, we describe existing techniques that can be used to deobfuscate, which were sometimes originally meant for other purposes, and new techniques which we are working on to deobfuscate.
... Our smallest JAR is 20 KB, and our largest is 2 MB. We used the Opaque Predicates watermarking algorithm GA1 [25] in our experimentation. According to Myles and Collberg [26], the resulting watermarks are robust to most, but not all, of the obfuscating transformations which Bob might attempt if he had access to the SandMark codebase. ...
Article
With the rapid development of cloud computing, software applications are shifting onto cloud storage rather than remaining within local networks. Software distributions within the cloud are subject to security breaches, privacy abuses, and access control violations. In this paper, we identify an insider threat to access control which is not completely eliminated by the usual techniques of encryption, cryptographic hashes, and access-control labels. We address this threat using software watermarking. We evaluate our access-control scheme within the context of a Collaboration-oriented Architecture, as defined by The Jericho Forum. Copyright © 2011 John Wiley & Sons, Ltd.
... A lot of research has been done on software watermarking . The major software watermarking algorithms currently available are based on several techniques, among which the register allocation, spread-spectrum, opaque predicate, abstract interpretation, dynamic path techniques (see, [1], [5], [10], [11], [14], [15], [16]). In 1996, Davidson and Myhrvold [12] proposed the first static software watermarking algorithm which embeds the watermark into an application program by reordering the basic blocks of a control flow-graph. ...
Conference Paper
Full-text available
In a software watermarking environment, several graph theoretic watermark methods encode the watermark val-ues as graph structures and embed them in application programs. In this paper we first present an efficient codec system for encoding a watermark number w as a reducible permutation graph F [π * ] through the use of the self-inverting permutation π * which encodes the number w and, then, we propose a method for embedding the watermark graph F [π * ] into a program P . The main idea behind the proposed embedding method is a systematic use of appropriate calls of specific functions of the program P . That is, our method embeds the graph F [π * ] into P using only real functions and thus the size of the watermarked program P * remains very small. Moreover, the proposed codec system has low time complexity, can be easily implemented, and incorporates such properties which cause it resilient to attacks.
... Arboit's [2] algorithm embeds a watermark by adding special opaque predicates to a program. Opaque predicates are logical expressions that have a constant value, but not obviously so [8]. ...
Conference Paper
Full-text available
This paper presents an implementation of the novel watermarking method proposed by Venkatesan, Vazirani, and Sinha in their recent paper A Graph Theoretic Approach to Software Watermarking. An executable program is marked by the addition of code for which the topology of the control-flow graph encodes a watermark. We discuss issues that were identified during construction of an actual implementation that operates on Java bytecode. We measure the size and time overhead of watermarking, and evaluate the algorithm against a variety of attacks.
... I. Arboit algorithm [21]: in this a trace of the program is used to select the branches. The watermark can be encoded in the opaque predicate by ranking the predicates in the library and then assigning each predicate a value or by using constants in the predicated to encode. ...
Article
Full-text available
In the era of Information technology, Software Piracy and security has become one of the most important issue in world. Numbers of techniques have been proposed and implemented to prevent software piracy and illegal modification. Among all the protection techniques, software watermarking technique which attempts to protect the software by embedding copyright notice or unique identifiers into software to prove the ownership of software. Software Watermarking discourage piracy; as a proof of purchase or authorship; also helps in tracking the source of illegal redistribution of copies of software. We evaluate the existing dynamic watermarking algorithms using them to watermark java bytecode files and then applying distortive attacks to each watermarked program by obfuscating. Our study has shown that some watermarks were removed as results of these transformations.
... Arboit [6] proposed a watermarking method where pieces of a watermark are encoded as constants within opaque predicates. The watermark is extracted by searching a program for the watermark opaque predicates and decoding them back into the watermark value. ...
Conference Paper
Full-text available
Software watermarks, which can be used to identify the intellectual property owner of a piece software, are broadly divided into two categories: static and dynamic. Static watermarks are embedded in the code and/or data of a computer program, whereas dynamic watermarking techniques store a watermark in a program's execution state. In this paper, we present a survey of the known static software watermarking techniques, including a brief explanation of each.
... Control flow obfuscation consists of applying transformations in order to hide the control flow of a program. Wellknown examples of control flow transformations are control flow flattening [19] and the insertion of opaque predi- cates [2, 5]. For this paper, we inserted opaque predicates, flattened a program through a control flow flattening algorithm , and turned a reducible control flow graph into a nonreducible one. ...
Conference Paper
Full-text available
Obfuscation is gaining momentum as a protection mech- anism for the intellectual property contained within or en- capsulated by software. Usually, one of the following three directions is followed: source code obfuscation is achieved through source code transformations, Java bytecode obfus- cation through transformations on the bytecode, and binary obfuscation through binary rewriting. In this paper, we study the effectiveness of source code transformations for binary obfuscation. The transformations applied by several existing source code obfuscators are empirically shown to have no impact on the stripped binary after compilation. Subsequently, we study which source code transformations are robust enough to percolate through the compiler into the binary.
... In the case of the binary opaque predicates that we implemented for this research, this property is the fact that they always evaluate to true (∀x ∈ Z, 2|x + x for example), or always evaluate to false (∀x ∈ Z, x 2 < 0 for example). The opaque predicates we inserted are taken from Arboit [2]. To avoid simple elemination, link-time liveness analysis and constant propagation are used to ensure that the inserted predicate computations do not change the original program behavior, and to ensure that the inputs of the predicate computations are not constant values. ...
Conference Paper
Full-text available
Despite the recent advances in the theory underlying obfus- cation, there still is a need to evaluate the quality of practical obfuscating transformations more quickly and easily. This paper presents the first steps toward a comprehensive eval- uation suite consisting of a number of deobfuscating trans- formations and complexity metrics that can be readily ap- plied on existing and future transformations in the domain of binary obfuscation. In particular, a framework based on software complexity metrics measuring four program prop- erties:code, control flow, data and data flow is suggested. A number of well-known obfuscating and deobfuscating trans- formations are evaluated based upon their impact on a set of complexity metrics. This enables us to quantitatively eval- uate the potency of the (de)obfuscating transformations.
... The dynamic call-graph G(Shortest Path * , I key ) of our watermarked program is presented in Figure 9(c). Observe that, G(Shortest Path * , I key ) is isomorphic to the watermark graph F[ (3,5,1,4,2)]. ...
Article
Full-text available
Software watermarking involves embedding a unique identifier or, equivalently, a watermark value within a software to prove owner's authenticity and thus to prevent or discourage copyright infringement. Towards the embedding process, several graph theoretic watermarking algorithmic techniques encode the watermark values as graph structures and embed them in application programs. Recently, we presented an efficient codec system for encoding a watermark number $w$ as a reducible permutation graph $F[\pi^*]$ through the use of self-inverting permutations $\pi^*$. In this paper, we propose a dynamic watermarking model, which we call WaterRPG, for embedding the watermark graph $F[\pi^*]$ into an application program $P$. The main idea behind the proposed watermarking model is a systematic use of appropriate calls of specific functions of the program $P$. More precisely, for a specific input $I_{key}$ of the program $P$, our model takes the dynamic call-graph $G(P, I_{key})$ of $P$ and the watermark graph $F[\pi^*]$, and produces the watermarked program $P^*$ having the following key property: its dynamic call-graph $G(P^*, I_{key})$ is isomorphic to the watermark graph $F[\pi^*]$. Within this idea the program $P^*$ is produced by only altering appropriate calls of specific functions of the input application program $P$. We have implemented our watermarking model WaterRPG in real application programs and evaluated its functionality under various and broadly used watermarking assessment criteria. The evaluation results show that our model efficiently watermarks Java application programs with respect to several watermarking metrics like data-rate, bytecode instructions overhead, resiliency, time and space efficiency. Moreover, the embedded watermarks withstand several software obfuscation and optimization attacks.
Conference Paper
Software piracy is a direct threat to the revenue of software vendors, requiring the need to employ effective and efficient techniques for detecting and preventing software piracy. One of the most promising attempts to protect intellectual property rights includes software watermarking. In this paper, we present a new technique for software watermarking which we call semblance based disseminated software watermarking algorithm (SDSW). Our technique embeds imitative instructions in object code using a defined dasiastring to instructionpsila mapping. These statements do not affect the overall execution semantics of program, however are hard to identify and extract for having resemblance with actual program code. This characteristic leads to the identification of the legitimate buyer, responsible for distribution of pirated copies to penalize legally. The effectiveness of proposed technique has been evaluated against published techniques implemented in Sandmark a watermarking application.
Chapter
Software piracy and tampering is a well known threat the world is faced with. There have been a lot of attempts to protect software from reverse engineering and tampering. It appears as if there is an ongoing war between software developers and crackers, both parties want to get an upper hand over each other as the time passes. Some of the ample techniques of software protection are reviewed, including multi-block hashing scheme, hardware based solutions, checksums, obfuscation, guards, software aging, cryptographic techniques and watermarking. All of these techniques play their parts imparted on them to protect the software from malicious attacks.
Article
Full-text available
At WiFi hot-spot deployment an essential question is that 802.11b or 802.11g and/or 802.11a system will be installed. For this decision an efficiency analysis is needed beyond the economic and rational considerations. As IP phone system is increasingly available in university environment, the analysis of practicability of WiFi phones during movement appears as an obvious object in indoor and outdoor environment as well. In this paper we focus on the analysis of the properties of multimedia applications (video, streaming, IP phone) operated over IEEE 802.11b/g/a WiFi systems.
Article
In this paper we review the state of the art in content protection for video games by describing the capabilities and shortcomings of currently deployed solutions. In an attempt to address some of the open issues, we present two novel approaches. The first approach uses branch-based software watermarking to discourage and detect piracy through a registration-based system. In the second approach, based on the parallels between games and premium audio and video content, we propose the use of current physical-media copy-protection technologies for gaming content. In particular, we focus on broadcast encryption technology. The use of an open, standard-based architecture enables the development of a more restrictive protection system for games. Finally, we demonstrate how the proposed protection mechanisms can be applied to video-game copy protection through five scenarios.
Conference Paper
In this paper we indicate the concept of dynamic data flow graph based on the dynamic data dependency during execution of software. On this basis, we propose a dynamic software watermarking algorithm, which embeds watermarking information into DDFG of software, and discuss the implementation of embedding and extracting watermark. Finally we analyze the performance of this algorithm.
Article
We present a separation logic abstract domain for static analysis by abstract inter- pretation. We consider separation logic with xp oint formulae. The domain embeds shape and alias information. The main originality compared to usual shape-graphs is that we treat all values (numerical, heap locations, nil,...) the same way, thus we can have numerical summary nodes. To keep the domain as general as possible, it is parameterized by a numerical abstract domain which can be instantiated as needed. We provide a semantics in terms of sets of memory (the usual model for separation logic) and sound functions on the domain, including a widening and a union which precision/cost can be tuned to the specic needs of the context where the domain is used.
Conference Paper
Based on the technique of code obfuscation in software protection, a new obfuscation scheme in constructing opaque predicates was proposed, which increased the complexity by employing the pseudo-random sequence and a group of Diophantine equations' solutions to construct a family of parameterized opaque predicates. To protect the output of the opaque predicates, the data chain list was dynamically generated. Pseudo-random sequence was also used to disturb the attackers' tracking debug. The complication of decompile was increased through block cryptosystems that convert the output of the opaque predicates into the corresponding judge conditions.
Article
This paper proposes a new method to construct a family of opaque predicates for Java programs by combining indeterminate equation and cryptography. In our construction, cryptography is first exploited for manufacturing opaque predicates. Our opaque predicates are more resilient, stealthier and cheaper. Our opaque predicates are dynamic and secure against all usual types of static attacks and dynamic attacks, also secure against crypto analysis
Article
A Spread Spectrum-based fragile software watermarking scheme is proposed in this paper. The algorithm extracts a vector of instruction-group frequencies from all basic blocks of original program and constructs a vector matrix. Furthermore, the scheme generates watermark based on the principal component analysis (PCA) technique and modifies the vector matrix to embed the watermark. With the system, spreading the watermark over the code provides not only a high level of stealth but also a global protection for the original program. The scheme not only can effectively detect tampering, but also has the ability to identify the type of tampering clearly.
Article
In the Internet age, software is one of the core components for the operation of network and it penetrates almost all aspects of industry, commerce, and daily life. Since digital documents and objects can be duplicated and distributed easily and economically cheaply and software is also a type of digital objects, software security and piracy becomes a more and more important issue. In order to prevent software from piracy and unauthorized modification, various techniques have been developed. Among them is software watermarking which protects software through embedding some secret information into software as an identifier of the ownership of copyright for this software. This paper gives a brief overview of software watermarking. It describes the taxonomy, attack models, and algorithms of software watermarking.
Conference Paper
We have recently presented an efficient codec system for encoding a watermark number w as a reducible permutation graph F [π*], through the use of self-inverting permutations π* and proposed a dynamic watermarking model, which we named WaterRpg, for embedding the watermark graph F [π*] into an application program P. In this paper, we implement our watermarking model WaterRpg in real application programs, taken from a game database, and evaluate its functionality under various watermarking issues supported by our WaterRpg model. More precisely, we selected a number of Java application programs and watermark them using two main approaches. First, we show in detail a straightforward or naive approach for watermarking a given program P which is based only on the well-defined call patterns of our model, and then we prove structural and programming properties of the call patterns based on which we can watermark the program P in a more stealthy way. The experimental results show the efficient functionality of all the programs P* watermarked under the naive-case and all the stealthy-cases. The size and the time overhead of the propose watermarking are very low.
Article
By introducing the current research status in software protection, the paper propose a multi-watermarking embedding algorithm to protect the whole software. The watermark information is separated at first and encrypted with the hyper-chaotic sequence, then each of the watermark is embedded into the core location of the program by the mapping fuction. Between the watermarks the effective tamper-proofing is established, if the tampering occurred, the watermark should sense and terminate the program immediately to protect the program. The experiments indicate its robustness is stronger.
Conference Paper
Opaque predicate obfuscation, a low-cost and stealthy control flow obfuscation method to introduce superfluous branches, has been demonstrated to be effective to impede reverse engineering efforts and broadly used in various areas of software security. Conventional opaque predicates typically rely on the invariant property of well-known number theoretic theorems, making them easy to be detected by the dynamic testing and formal semantics techniques. To address this limitation, previous work has introduced the idea of dynamic opaque predicates, whose values may vary in different runs. However, the systematical design and evaluation of dynamic opaque predicates are far from mature. In this paper, we generalize the concept and systematically develop a new control flow obfuscation scheme called generalized dynamic opaque predicates. Compared to the previous work, our approach has two distinct advantages: (1) We extend the application scope by automatically transforming more common program structures (e.g., straight-line code, branch, and loop) into dynamic opaque predicates; (2) Our system design does not require that dynamic opaque predicates to be strictly adjacent, which is more resilient to the deobfuscation techniques. We have developed a prototype tool based on LLVM IR and evaluated it by obfuscating the GNU core utilities. Our experimental results show the efficacy and generality of our method. In addition, the comparative evaluation demonstrates that our method is resilient to the latest formal program semantics-based opaque predicate detection method.
Article
Full-text available
Software obfuscation has been developed for over 30 years. A problem always confusing the communities is what security strength the technique can achieve. Nowadays, this problem becomes even harder as the software economy becomes more diversified. Inspired by the classic idea of layered security for risk management, we propose layered obfuscation as a promising way to realize reliable software obfuscation. Our concept is based on the fact that real-world software is usually complicated. Merely applying one or several obfuscation approaches in an ad-hoc way cannot achieve good obscurity. Layered obfuscation, on the other hand, aims to mitigate the risks of reverse software engineering by integrating different obfuscation techniques as a whole solution. In the paper, we conduct a systematic review of existing obfuscation techniques based on the idea of layered obfuscation and develop a novel taxonomy of obfuscation techniques. Following our taxonomy hierarchy, the obfuscation strategies under different branches are orthogonal to each other. In this way, it can assist developers in choosing obfuscation techniques and designing layered obfuscation solutions based on their specific requirements.
Chapter
This article details the idea of Crowdsourced Reverse Engineering (CSRE) by analysing three major challenges: (1) automatic task extraction, (2) source code anonymization and (3) results aggregation and quality control. We re-formulate the Reverse Engineering activity of concept assignment as a crowdsourced classification task to exemplify these challenges and describe suitable methods to address them. Our overview on existing research of crowdsourcing showcases examples of successful application in the field of Software Engineering and argues that Reverse Engineering activities like Concept Assignment are likely to also benefit from crowdsourcing by determining a high similarity in eight crowdsourcing dimensions to the microtasking model. Our experiments on the crowdsourcing platform microworkers.com support this, producing 187 results by 34 crowd workers which classified 10 code fragments with decent quality. We provide an extended analysis of the observed crowd workers’ behavior and report evidence of surprisingly high levels of engagement and efforts undertaken by the crowd. Concluding our experiences, this article indicates three open research challenges for future work.
Chapter
This paper proposes an obfuscation algorithm based on congruence equation and knapsack problem, for the problem of generating opaque predicate in control flow obfuscation. We binarize the state of the solution of the congruence equation, and then combine the knapsack problem to achieve the binary output of opaque predicate. Compared with the existing chaotic opaque predicate design algorithm, the experimental comparison shows that this paper generates the probability of opaque predicate binary results is nearly 50%, and the time overhead is controlled within 1 s. The experimental results show that the proposed algorithm has better performance in randomness, stability and time overhead, which can effectively resist code reverse analysis.
Article
Full-text available
The concept of software birthmark is developed for the detection of theft and piracy in software applications. The originality of software can be evaluated by comparing software programs on the basis of their birthmarks. A number of birthmark designs have been proposed which are used to specify birthmark for source code and executable code related to particular programming languages. This study presents a systematic literature review on available software birthmark designs and related techniques for comparing birthmarks in order to identify pirated software. This research is focused on identifying different applications of software birthmark, especially the estimation of software birthmark to identify the extent of piracy performed in a software. The objective is to gain insight into complex details of software birthmark by accumulating and analyzing the knowledge provided in the literature in order to facilitate further research in software birthmark and its applications. The study is conducted by following the systematic literature review protocol. The data are collected from primary studies published from 1992 to April 2018 in specified journals and conference/workshop proceedings. A total of 143 primary studies are selected, based on predefined exclusion, inclusion, and quality criteria. The research identifies 22 software birthmark techniques frequently used and discussed by researchers and industry. The study also identifies a number of important applications of software birthmarks. These applications define the use of software birthmark in software theft and plagiarism detection, intellectual software asset management, detecting binary theft, malware detection, detecting the theft of natural language, and semantics-based repackaging detection for mobile apps. The results show that despite the large-scale research and development of different birthmark techniques, there is a lack of organized knowledge which is needed to facilitate the usage of software birthmark for critical applications like clone detection and malware detection. Furthermore, it is seen that the area of software birthmark estimation is not well researched which needs to be explored further. The study recommends that the area of software birthmark needs to be explored for developing a reliable and authentic mechanism which can accurately and easily detect software theft and ultimately prevent the piracy of software.
Conference Paper
Opaque predicates have been widely used to insert superfluous branches for control flow obfuscation. Opaque predicates can be seamlessly applied together with other obfuscation methods such as junk code to turn reverse engineering attempts into arduous work. Previous efforts in detecting opaque predicates are far from mature. They are either ad hoc, designed for a specific problem, or have a considerably high error rate. This paper introduces LOOP, a Logic Oriented Opaque Predicate detection tool for obfuscated binary code. Being different from previous work, we do not rely on any heuristics; instead we construct general logical formulas, which represent the intrinsic characteristics of opaque predicates, by symbolic execution along a trace. We then solve these formulas with a constraint solver. The result accurately answers whether the predicate under examination is opaque or not. In addition, LOOP is obfuscation resilient and able to detect previously unknown opaque predicates. We have developed a prototype of LOOP and evaluated it with a range of common utilities and obfuscated malicious programs. Our experimental results demonstrate the efficacy and generality of LOOP. By integrating LOOP with code normalization for matching metamorphic malware variants, we show that LOOP is an appealing complement to existing malware defenses.
Conference Paper
In this study we proposed a robust watermarking algorithm to improve the watermark hidden. The watermark information is separated with the Chinese Remainder Theorem at first and then use Planted Plane Cubic Tree enumeration encoding topology structure. At last the sub-watermarking is hashed to the entire program code by chaotic system to protect the entire program code. In the extracting procedure the user inputs secret key to restore the graph structure into a watermark information to prove his copyright. The experiments indicate its robustness and stealthiness is stronger and more practicable.
Conference Paper
Information hiding is widely used in almost all intelligence and security software systems as a standard technology to prevent piracy and copyright infringement. This technology mainly involves the idea of digital watermarking where a unique identifier (or, watermark number) is embedded into software, image, audio, or video data through the introduction of errors not detectable by human perception. In software watermarking, the proposed graph theoretic methods usually encode watermark numbers as graphs whose structure resembles that of real program graphs. In this paper, in light of our recently published algorithms which encode a watermark number w as a self-inverting permutation, we present an efficient encoding method, along with its corresponding decoding one, which embeds a self-inverting permutation π* into reducible permutation graphs F[π*]. More precisely, we present an encoding algorithm which embeds the permutation π* into F[π*] by first computing the heap-ordered tree of π* (i.e., a rooted binary tree having specific node-value and child-parent properties) using the lattice representation of π* and then, based on the heap node-value properties, producing a reducible permutation graph F[π*]. Moreover, we exploit the max-heap and min-heap representation tree of permutation π* and show that we can efficiently encode the same watermark w into two different reducible permutation graphs F1[π*] and F2[π*]. In general, such a property increases the safety performance of a watermarking system against attacks since it can embed multiple copies of the same watermark value w into a digital object.
Chapter
In this paper, we propose a persistent watermarking technique of information systems supported by relational databases at the back-end. The persistency is achieved by identifying an invariant part of the database which remains unchanged w.r.t. the operations in the associated applications. To achieve this, we apply static data-flow analysis technique to the applications. The watermark is then embedded into the invariant part of the database, leading to a persistent watermark. We also watermark the associated applications in the information system by using opaque predicates which are obtained from the variant part of the database.
Conference Paper
Unauthorized use of digital contents has become a problem in recent years. In particular, spread of illegally copied software is getting serious. Conventional approaches have drawbacks that they hamper the convenience of legitimate users. In this study, we bring up another idea 'identifying the person who illegally distributed a fraud software code' so as not to limit the rights of innocent users, by embedding digital fingerprinting. In our digital fingerprinting scheme, code blocks of the target execution code are fragmented and reordered at random. Fingerprint is embedded as the control for execution of the fragmented code blocks in the original order. Dummy codes are inserted so that attackers would have difficulty in tracking the original execution flow and removing the fingerprint. A preliminary evaluation shows that of a fingerprinted code based on our scheme has enough robustness against existing obfuscator and our system works enough efficiently in execution time.
Conference Paper
Taint analysis has a wide variety of applications in software analysis, making the precision of taint analysis an important consideration. Current taint analysis algorithms, including previous work on bit-precise taint analyses, suffer from shortcomings that can lead to significant loss of precision (under/over tainting) in some situations. This paper discusses these limitations of existing taint analysis algorithms, shows how they can lead to imprecise taint propagation, and proposes a generalization of current bit-level taint analysis techniques to address these problems and improve their precision. Experiments using a deobfuscation tool indicate that our enhanced taint analysis algorithm leads to significant improvements in the quality of deobfuscation.
Article
Static disassembly is used to analyze program control flow that is the key process of reverse analysis. Aiming at the problem that attackers are always using static disassembly to analyze control transfer instructions and control flow graph, a mixed obfuscation of overlapping instruction and self-modify code based on hyper-chaotic opaque predicates is proposed, jump offsets in overlapping instructions and data offsets in self-modify code are constructed with opaque predicates. Control transfer instructions are modified into control transfer unrelated ones with the combination of characteristics of overlapping instruction and self-modify code. Experiments and analysis show that control flow graph can be obfuscated by mixed obfuscation due to the difficulty of hyper-chaotic opaque predicates for attackers to analyze.
Article
Full-text available
Program obfuscation is a semantic-preserving transformation aimed at bringing a program into a form that impedes understanding of its algorithm and data structures or prevents extracting certain valuable information from the text of the program. Since obfuscation may find wide use in computer security, information hiding and cryptography, security requirements to program obfuscators have become a major focus of interest in the theory of software obfuscation starting from the pioneering works in this field. In this paper we give a survey of various definitions of obfuscation security and basic results that establish possibility or impossibility of secure program obfuscation under certain cryptographic assumptions.
Conference Paper
Software piracy is one of the major concerns of programmers and software developers costing them huge amounts of financial losses every year. One of the programming languages used to develop software is Java. Although Java has some advantages over other languages, programs written using it are more vulnerable to software piracy than others. This is due to the fact that decompiling Java programs to their source codes is a relatively easy task. This paper proposed a novel method of watermarking Java programs. The suggested technique aimed to embed a watermark by means of appending a spurious If statement on to it. The proposed method was then implemented and tested in order to evaluate its security and resiliency against different types of attacks. The experimental results showed that the new method is more resilient to obfuscation and decompile-recompile attacks in comparison with the methods proposed by Genevieve Arboit in A Method for Watermarking Java Programs via Opaque Predicates [7] and Monden et al in A Practical Method for Watermarking Java Programs [5].
Conference Paper
In the Internet age, Software security and piracy becomes a more and more important issue. In order to prevent software from piracy and unauthorized modification, various techniques have been developed. Among them is software watermarking which protects software through embedding some secret information into software as an identifier of the ownership of copyright for this software. This paper gives an new algorithm based on stack-state transition graph, watermarks is embed by adding additional code in the executable file, and extracted by recognizing the relationship of stack-state which processed in runtime. Analysis proves that our algorithm is more reliable.
Proving in zero-knowledge that a number is the product of two safe primes
  • J Camenisch
  • M Michels
J. Camenisch and M. Michels. Proving in zero-knowledge that a number is the product of two safe primes. Lecture Notes in Computer Science, 1592:107-122, 1999.
Software watermarking: Models and dynamic embeddings
  • C Collberg
  • C Thomborson
C. Collberg and C. Thomborson. Software watermarking: Models and dynamic embeddings. In POPL'99, 26th Annual SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 311-324, 1999.
Manufacturing cheap, resilient, and stealthy opaque constructs
  • C Collberg
  • C Thomborson
  • D Low
C. Collberg, C. Thomborson, and D. Low. Manufacturing cheap, resilient, and stealthy opaque constructs. In Symposium on Principles of Programming Languages, pages 184-196, 1998.
Software watermarking: Protective terminology
  • J Nagra
  • C Thomborson
  • C Collberg
J. Nagra, C. Thomborson, and C. Collberg. Software watermarking: Protective terminology. In Australasian Computer Science Conference, pages 177-186, 2002.
Article
Full-text available
We identify three types of attack on the intellectual property contained in software and three corresponding technical defenses. A defense against reverse engineering is obfuscation, a process that renders software unintelligible but still functional. A defense against software piracy is watermarking, a process that makes it possible to determine the origin of software. A defense against tampering is tamper-proofing, so that unauthorized modifications to software (for example, to remove a watermark) will result in nonfunctional code. We briefly survey the available technology for each type of defense.
Conference Paper
Full-text available
Java programs distributed through the Internet are now suffering from program theft. This is because Java programs can be easily decomposed into reusable class files and even decompiled into source code by program users. We propose a practical method that discourages program theft by embedding Java programs with a digital watermark. Embedding a program developer's copyright notation as a watermark in Java class files will ensure the legal ownership of class files. Our embedding method is indiscernible by program users, yet enables us to identify an illegal program that contains stolen class files. The result of the experiment to evaluate our method showed most of the watermarks (20 out of 23) embedded in class files survived two kinds of attacks that attempt to erase watermarks: an obfuscactor attack, and a decompile-recompile attack
Article
Full-text available
Informally, an obfuscator \( \mathcal{O} \) is an (efficient, probabilistic) “compiler” that takes as input a program (or circuit) P and produces a new program \( \mathcal{O} \)(P) that has the same functionality as P yet is “unintelligible” in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic and complexity-theoretic applications, ranging from software protection to homomorphic encryption to complexity-theoretic analogues of Rice’s theorem. Most of these applications are based on an interpretation of the “unintelligibility” condition in obfuscation as meaning that \( \mathcal{O} \) is a “virtual black box,” in the sense that anything one can efficiently compute given \( \mathcal{O} \), one could also efficiently compute given oracle access to P. In this work, we initiate a theoretical investigation of obfuscation. Our main result is that, even under very weak formalizations of the above intuition, obfuscation is impossible. We prove this by constructing a family of functions \( \mathcal{F} \) that are inherently unobfuscatable in the following sense: there is a property π: \( \mathcal{F} \) → {0,1} such that (a) given any program that computes a function f ∈ \( \mathcal{F} \), the value π(f) can be efficiently computed, yet (b) given oracle access to a (randomly selected) function f ∈ \( \mathcal{F} \), no efficient algorithm can compute π(f) much better than random guessing. We extend our impossibility result in a number of ways, including even obfuscators that (a) are not necessarily computable in polynomial time, (b) only approximately preserve the functionality, and (c) only need to work for very restricted models of computation (TC 0). We also rule out several potential applications of obfuscators, by constructing “unobfuscatable” signature schemes, encryption schemes, and pseudorandom function families.
Article
Full-text available
In this paper, we clarify what steganography is and what it can do. We contrast it with the related disciplines of cryptography and tra#c security, present a unified terminology agreed at the first international workshop on the subject, and outline a number of approaches---many of them developed to hide encrypted copyright marks or serial numbers in digital audio or video. We then present a number of attacks, some new, on such information hiding schemes. This leads to a discussion of the formidable obstacles that lie in the way of a general theory of information hiding systems (in the sense that Shannon gave us a general theory of secrecy systems). However, theoretical considerations lead to ideas of practical value, such as the use of parity checks to amplify covertness and provide public key steganography. Finally, we show that public key information hiding systems exist, and are not necessarily constrained to the case where the warden is passive. Keywords--- Cryptography, Copyright protection, Data compression, Image registration, Jitter, Motion pictures, Multimedia systems, Music, Observability, Pseudonoise coded communication, Redundancy, Spread spectrum communication, Software protection I.
Chapter
A randomized algorithm is one that makes random choices during its execution. The behavior of such an algorithm may thus be random even on a fixed input. The design and analysis of a randomized algorithm focus on establishing that it is likely to behave well on every input; the likelihood in such a statement depends only on the probabilistic choices made by the algorithm during execution and not on any assumptions about the input. It is especially important to distinguish a randomized algorithm from the average-case analysis of algorithms, where one analyzes an algorithm assuming that its input is drawn from a fixed probability distribution. With a randomized algorithm, in contrast, no assumption is made about the input.
Conference Paper
The work studies the problem of copyright protection of software using cryptographic fingerprinting. The identity of software must be unique for each copy. There are two classes of identities: one is based on equivalent variants of the program and the other applies behaviour of the software as its identity. For these two types of identity, we introduce two different fingerprint schemes. The two schemes use digital signatures and can be easily combined and extended to be resilient against partial fingerprint destruction.
Article
Computer software is characterized by features that defy classification within established legal doctrines. In particular, areas of the law that rely on distinctions between the functional and the "expressive" have found software problematic. This article briefly describe areas of law in which this problem is most pressing, sketch how these problems have migrated across legal domains and indicate the path legal controversy attending software seems likely to take, especially in patent law. The picture that emerges is one of continuing legal controversy regarding software within the law of copyright, patent and free speech, as well as a blurring among these areas as they are stretched to accommodate this chimeric technology. For more than 20 years, copyright has been the primary mode of intellectual property protection for software in the U.S. and in much of the rest of the world. Copyright covers original works of authorship, including literary works, dramatic works, audiovisual works, musical compositions, audio recordings, pictorial, graphic and sculptural works, choreographic works and even architectural works. A copyright arises whenever original expression is fixed for some substantial duration in a tangible medium of expression. This act of fixing original expression gives the copyright owner the exclusive right to reproduce and distribute copies of the work.
Conference Paper
There are at least four US patents on software watermarking, and an idea for further advancing the state of the art was presented by C. Collberg and C. Thomborsen (1999). The new idea is to embed a watermark in dynamic data structures, thereby protecting against many program-transformation attacks. Until now there have been no reports on practical experience with this technique. We have implemented and experimented with a watermarking system for Java based on the ideas of Collberg and Thomborsen. Our experiments show that watermarking can be done efficiently with moderate increases in code size, execution times and heap-space usage, while making the watermarked code resilient to a variety of program-transformation attacks. For a particular representation of watermarks, the time to retrieve a watermark is on the order of one minute per megabyte of heap space. Our implementation is not designed to resists all possible attacks; to do that, it should be combined with other protection techniques, such as obfuscation and tamperproofing
Conference Paper
We present a graph theoretic approach for watermarking software in a robust fashion. While watermarking software that are small in size (e.g. a few kilobytes) may be infeasible through this approach, it seems to be a viable scheme for large applications. Our approach works with control/data flow graphs and uses abstractions, approximate k-partitions, and a random walk method to embed the watermark, with the goal of minimizing and controlling the additions to be made for embedding, while keeping the estimated effort to undo the watermark (WM) as high as possible. The watermarks are so embedded that small changes to the software or flow graph are unlikely to disable detection by a probabilistic algorithm that has a secret. This is done by using some relatively robust graph properties and error correcting codes. Under some natural assumptions about the code added to embed the WM, locating the WM by an attacker is related to some graph approximation problems. Since little theoretical foundation exists for hardness of typical instances of graph approximation problems, we present heuristics to generate such hard instances and, in a limited case, present a heuristic analysis of how hard it is to separate the WM in an information theoretic model. We describe some related experimental work. The approach and methods described here also suitable for solving the problem of software tamper resistance.