Hunting Trojan Horses.
ABSTRACT Abstract In this report we present HTH (Hunting Trojan Horses), a security framework for detecting Trojan Horses and Backdoors. The framework,is composed,of two main,parts: 1) Harrier – an application,security monitor,that performs,run-time monitoring to dynamically,collect execution-related data, and 2) Secpert – a security-specific Expert System based on CLIPS, which analyzes the events collected by Harrier. Our main,contributions,to the security research are three-fold. First we identify common malicious behaviors, patterns, and characteristics of Trojan Horses and Backdoors. Second we develop a security policy that can identify such malicious behavior and open the door for effectively using expert systems to implement complex security policies. Third, we construct a prototype,that successfully detects Trojan Horses and Backdoors. 1,Introduction Computer,attacks grew,at an alarming,rate in 2004  and this rate is expected,to rise.
Article: Neural Network Trojan[Show abstract] [Hide abstract]
ABSTRACT: This paper presents a proof of concept of a neural network Trojan. The neural network Trojan consists of a neural network that has been trained with a compromised dataset and modified code. The Trojan implementation is carried out by insertion of a malicious payload encoded into the weights alongside with the data of the intended application. The neural Trojan is specifically designed so that when a specific entry is fed into the trained neural network, it triggers the interpretation of the data as payload. The paper presents a background on which this attack is based and provides the assumptions that make the attack possible. Two embodiments of the attack are presented consisting of a basic backpropagation network and a Neural Network Trojan with Sequence Processing Connections (NNTSPC). The two alternatives are used depending on the underlying circumstances on which the compromise is launched. Experimental results are carried out with synthetic as well as a chosen existing binary payload. Practical issues of the attack are also discussed, as well as a discussion on detection techniques.Journal of Computer Security. 01/2013; 21.
- [Show abstract] [Hide abstract]
ABSTRACT: The Research of detection malware using machine learning method attracts much attention recent years. However, most of research focused on code analysis which is signature-based or analysis of system call sequence in Linux environment. Obviously, all methods have their strengths and weaknesses. In this paper, we concentrate on detection Trojan horse by operation system information in Windows environment using data mining technology. Our main content and contribution contains as follows: First, we collect Trojan horse samples in true network environment and classify them by scanner. Secondly, we collect operation system behavior features under infected and clean circumstances separately by WMI manager tools. And then, several classic classification algorithms are applied and a performance comparison is given. Feature selection methods are applied to those features and we get a feature order list which reflects the relevance order of Trojan horse activities and the system feature. We believe the instructive meaning of the list is significant. Finally, a feature combination method is applied and features belongs different groups are combined according their characteristic for high classification performance. Results of experiments demonstrate the feasibility of our assumption that detecting Trojan horses by system behavior information is feasible and affective.Machine Learning and Cybernetics (ICMLC), 2010 International Conference on; 08/2010
Conference Paper: LeakProber: a framework for profiling sensitive data leakage paths.[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we present the design, implementation, and evaluation of LeakProber, a framework that leverages the whole system dynamic instrumentation and the inter-procedural analysis to enable data propagation path profiling in production system. We integrate both the static analysis and runtime tracking to establish a holistic and practical approach to generating the sensitive data propagation graph (sDPG) with minimum runtime overhead. We evaluate our system on several data stealing attacks scenario for generating sDPG. The sDPG generated by our system captures multiple aspects of data accessing patterns and provides clear insights into the data leakage path. We also measure the performance of our system and find that it degrades the production system about 6% in the trace-on mode. When our prototype works in the trace-off mode, the runtime overhead is even lower, on an average of 1.5% across each benchmark we run. We believe that it is feasible to directly apply our prototype into production system environment.First ACM Conference on Data and Application Security and Privacy, CODASPY 2011, San Antonio, TX, USA, February 21-23, 2011, Proceedings; 01/2011
Technical Report TR-01
Hunting Trojan Horses
Micha Moffie and David Kaeli
Hunting Trojan Horses
Micha Moffie and David Kaeli
Computer Architecture Research Laboratory
Northeastern University, Boston, MA
In this report we present HTH (Hunting Trojan Horses), a security framework for detecting
Trojan Horses and Backdoors. The framework is composed of two main parts: 1) Harrier –
an application security monitor that performs run-time monitoring to dynamically collect
execution-related data, and 2) Secpert – a security-specific Expert System based on CLIPS,
which analyzes the events collected by Harrier.
Our main contributions to the security research are three-fold. First we identify common
malicious behaviors, patterns, and characteristics of Trojan Horses and Backdoors. Second
we develop a security policy that can identify such malicious behavior and open the door for
effectively using expert systems to implement complex security policies. Third, we construct
a prototype that successfully detects Trojan Horses and Backdoors.
Computer attacks grew at an alarming rate in 2004  and this rate is expected to rise.
Additionally, zero-day attack exploits are already being sold on the black market. Cases in
which malicious code was used for financial gain were reported in the 2005 Symantec Internet
Security Threat Report  (for the first half of 2005).
Symantec also reports a rise in the occurrence of malicious code that exposes confidential
information. They further report that this class of malicious code attack represents 74% of
the top 50 code samples reported to Symantec in 2005 . A key characteristic reported was
that six out of the top ten spyware programs were bundled with other programs.
It is difficult to guard against malicious code that exposes confidential information or
tampers with information. These exploits may take the form of Trojan Horses or Backdoors
that are installed without the user’s consent. Moreover, freshly authored malicious code (i.e.,
zero-day attacks) can go undetected by even the most up-to-date anti-virus programs. Since
Trojan horses and Backdoors may have very little immediate impact on the normal operation
of a system, they may go undetected for a significant period of time, allowing the attacker a
large window of opportunity.
We have developed a security framework that can uncover Trojan Horses and Backdoors,
and defend against harmful activity. In this report, we describe HTH, a run-time monitor that
comes coupled with new security policy tools.
In the following section, we present the motivation for our work. We review related work
in section 3. In section 4, we introduce our security policy. In section 5, we explore the design
space for HTH and discuss some of the design tradeoffs made in HTH. Sections 6 and 7 delve
into the design and implementation of Secpert and Harrier. We evaluate HTH effectiveness in
section 8. Initial assessments of HTH performance are described in section 9. We conclude in
2.1Security Exploits Examples
To establish the basis and motivation for our work we present some real world examples of
malicious code exploits.
1. PWSteal.Tarno.Q is a Trojan horse that logs passwords and information typed into web
forms. The downloader portin of the Trojan arrives as an email attachment. When
the attachment is executed, the main part of the Trojan is downloaded from a fixed
location. The Trojan creates a file and registers it as a browser helper for Internet
Explorer (IE). The helper object is executed every time IE runs. The Trojan monitors
a predefined set of web pages (such as those that contain strings like: bank, cash, gold,
etc.), captures keystrokes and web forms submitted. The Trojan stores the information
in several predefined files. Then the Trojan sends a unique ID (of the compromised
computer) to the attacker (using a predefined http address) and periodically sends the
collected information to a predefined url. 
2. The Trojan.Lodeight.A code tries to install malicious code on the compromised computer
and open a Backdoor. When this Trojan is executed, it connects to one of two predefined
websites and downloads a remote file and executes it (the remote file may be a Beagle
worm). Then this Trojan opens a Backdoor on a TCP port 1084. 
3. W32.Mytob.J@mm is a mass-mailing worm which includes a Backdoor. The worm sends
itself via email and uses a remote buffer overflow to spread through the network.
The worm copies itself to a system folder and modifies the registry such that the worm is
executed every time Windows starts. It collects email addresses and sends itself to some
of those addresses (according to predefined characteristics). The worm starts an FTP
server, connects to one of two predefined IRC channels, and listens for commands that
allow the attacker to download files, execute files, restart the system or run other IRC
4. As part of an adware program, the Trojan.Vundo presents the user with pop-up ad-
vertisements. One component of Trojan.Vundo (HTML code that exploits a Microsoft
internet vulnerability) tries to download and execute a downloader component. If suc-
cessful, this downloader component will create an executable file and save it in one or
more directories. In addition, the downloader component will download (from a specified
IP address) an adware component of the Trojan and cause it to execute (this component
is a dynamically linked library and is injected into different processes). It will also mod-
ify the Windows Registry to execute itself upon startup. Once executed, the Trojan will
degrade Windows performance by decreasing the amount of virtual memory available, as
well as displaying advertisements on the infected machine .
5. The Windows-update.com is a fake web site that exploits an internet explorer vulnera-
bility to download and install Trojan horses. Using a vulnerable version of IE to access
the fake windows site may cause the following: 1) an executable will be downloaded and
executed on the computer. 2) the executable will run and download configuration infor-
mation from a predefined website (lol.ifud.cc/18.104.22.168). And 3) connect to a third
web site and choose one of many unknown custom Trojan Horse programs to download
(depending on the configuration downloaded) .
6. W32/MyDoom.B virus is an executable file that can infect a Windows system. When
executed, the virus attempts to generate files and add entries to the Windows registry.
The virus modifies the registry to execute itself (at log in time) and to reference a
Backdoor component. In addition, the virus downloads and installs a Backdoor. The
Backdoor component (ctfmon.dll) opens a TCP port and can accept commands, execute
additional code, or act as a TCP proxy. (US-CERT Alert TA04-028A) .
7. The Phatbot Trojan can be controlled by an attacker on a remote site (using a p2p
protocol). The Trojan has a large set of commands which can be executed. A few
of these commands include: stealing CD keys, running a command using system(..),
displaying system information, executing file from an ftp url and killing a process .
8. The Trojan Horse version of the Sendmail Distribution contains malicious code that is
executed during the process of building the software. The Trojan forks a process that
connects to a fixed remote server on port 6667. The forked process allows an intruder
to open a shell running as the user who built the Sendmail software (CERT Advisory
9. A Trojan horse version of TCP Wrappers can provide root access to intruders who are
initiating connections with a source port of 421. Also, upon compilation of the program,
this Trojan horse sends email to an external address. The email includes information
which can identify the site and the account that compiled the program. Specifically, the
Trojan sends the information obtained from running the commands whoami and uname
-a (CERT Advisory CA-1999-01) .
The examples above include very recent examples of Trojans and Backdoors. Next, we
use these and other examples to characterize Trojan Horses and Backdoors and uncover their
common execution patterns.
2.2Trojan Horses and Backdoors Characteristics
If we study the set of the Trojan Horses and Backdoors just discussed (as well as others
malicious code examples), we can detect several distinct characteristics and behaviors:
1. Executables are downloaded and executed without user intervention.
2. The malicious code may create and/or update files in the file system (possibly the Win-
dows registry) with fixed (i.e., hard-coded) values.
3. The code initiates a connection to a fixed remote host. The code may then download
executables/data or upload private information.
4. The code allows a remote user to initiate commands or control the execution of the local
5. The code may degrade computer performance.
6. The malicious code may execute only under specific conditions (e.g., execute only on a
specific port number).
To characterize a Trojan Horse or a Backdoor, one must consider the environment in which
these exploits operate. From an attacker’s point of view, his malicious program - the Trojan
Horse - is operating in an unfriendly environment. First, a user can not control the malicious
code, nor can the attacker until a connection is made. This means that the malicious code must
be self-contained, and it must execute without any guidance from the user or the attacker.
Second, the user or an anti-virus program may try to terminate the program as soon as it is
detected; it is therefore beneficial for the program to hide and disguise itself as well as conceal
The common execution patterns of Trojan Horses and Backdoors are summarized below:
1. The malicious code is executed without user intervention.
2. The malicious code may be directed by the remote attacker once a connection is estab-
3. Resources used by the malicious code, such as file names and network addresses, are
hard-coded in the binary.
4. OS resources (Processes, memory) used by the malicious code may be consumed for the
purpose of degrading performance.
In Table 1, we summarize the execution patterns exhibited by the different malicious examples.
Exploit Name No user
TCP Wrappers Trojan
Table 1: Execution patterns exhibited by malicious code.
Table 1 shows how similar Trojan Horses and Backdoors behave. Many of those character-
istic are unique to Trojan Horses and Backdoors and are exploited to distinguish good from
malicious behavior. These unique behavior patterns are used as a basis for our security policy.
Our main objective in this work is to complement anti-virus softwares by targeting unknown
and zero-day attacks. We are particulary focused on Trojan Horses and Backdoor attacks. We
aim to correctly identify and thwart these attacks before any harm comes ot the system, as
well as reduce the number of false positive that typically occur in many firewalling systems.
There are a number of approaches that can be followed to reduce a system’s security risk to
intrusion from malicious code. However, no prior work has specifically targeted Trojan Horses
In this section we review related work in several related areas, including static and dynamic
information flow systems, intrusion detection systems, and isolation and confining systems. We
also discuss prior work in machine learning and data mining approaches for security.
3.1Information Flow Systems
Information flow security systems have focused on language-based and static analysis mech-
anisms [2, 20]. These systems only allow the programmer to specify the policy. This means
that the user puts his trust in the code developer and is not able to enforce his own security
In contrast, RIFLE is an architectural framework for user-centric information flow security.
This system can track information flow in all programs. This equips the user (in contrast to the
programmer) with a practical way of enforcing any information flow policy . Information
tracking can also be used to defeat malicious attacks by identifying spurious information flows
and restricting their usage .
Perl introduced a new taint mode. This mode enables Perl’s interpreter to track all user
input data (which is tainted) and restrict the actions the Perl program is allowed to perform
on that input .
Valgrind  has been used to rewrite the binary during runtime to dynamically check
for overwrite attacks . It was also used to detect undefined value errors at the bit level
. The authors add a shadow bit for every data bit, indicating if the bit is undefined, and
instruments the data during value creation. The MIT DOG project  uses binary rewriting
to track user input (tainted data) very efficiently. DOG introduces an average slow down of
5.5 times compared to native execution.
Run time information systems are becoming more prevalent in security systems. Many
of those runtime systems specialize in tracking one source of data such as user input, and
develop security policies for common exploits. In HTH we consider different sources of data
and dynamically track all of them to support our policy.
3.2Intrusion Detection Systems
Program shepherding was introduced by Kiriansky et al.  and is used to enforce a security
policy. Program shepherding thwarts attacks that change the control flow (such as buffer
overflow attacks) by monitoring the program’s dynamic control flow.
System call monitoring is often used to detect malicious code      . Mon-
itoring can be used to differentiate between normal behavior which was recorded beforehand,
and anomalous behavior . The history of access requests can be also be used to dynamically
classify programs on-line and execute them with appropriate privileges .
Software wrappers can be used to detect and remedy system intrusions . These wrap-
pers are software layers that are dynamically inserted into the kernel, and that can selectively
intercept and analyze system calls at runtime. Using software wrappers in the kernel can sig-
nificantly reduce the performance overhead associated with profiling, but offer less information
on the call compared to the detailed information available in user space.
Run time monitoring of untrusted helper applications was proposed by Goldberg et al.
in . The authors proposed to create a secure environment for untrusted helper applications
by limiting program access to operating system resources.
Scott et al. developed a portable extensible framework for constructing a safe virtual
execution system . They demonstrated how to easily profile system calls and how a simple
policy can be constructed. They present several policies that can track specific malicious
Gap et al.  perform an analysis of many host-based anomaly detection systems. These
systems monitor a process running a known program by tracking the system calls the process
makes. They organize previously proposed solutions across three dimensions:
• Runtime information that the detector uses to check for anomalies. This includes system
call number, as well as arguments and information extracted from the process address
space, such as the program counter and return addresses.
• The atomic unit that the detector monitors, i.e. a single system call or a variable-length
sequence of system calls.
• The history - the number of atomic units the detector remembers.
System call monitoring is so prevalent in Intrusion Detection Systems because it provides a
lot of insight on program behavior. In addition, system call tracing can be done very efficiently
and introduce limited overhead to program execution. As such system call monitoring is a key
aspect of HTH.
3.3Isolation and confining Systems
The Alcatraz system, presented in  is a isolation system that implements the idea of logical
isolation. The actions of the program are invisible to the rest of the system until they are
committed by the user. The Alcatraz system intercepts all operating system calls, and all file
operations are redirected to a ’modification cache’ that is invisible to the rest of the system.
Terra  is an architecture for trusted computing. Terra builds on a TVMM - a trusted
virtual machine monitor - which allows terra to partition the platform in to multiple, isolated
VMs (virtual machines). Each VM can be tailored to provide for a particular level of security
and compatibility. This allows each application to run in its own VM, either as an ”open
box” VM with the semantics of a modern open platform, or as a ”closed box” VM with
dedicated, tamper-resistant, hardware accompanied by a tailored that can protect the privacy
and integrity of its content.
Isolation and confining systems have the advantage of separating the execution affects of
malicious code from the rest of the system. The main disadvantages of such approaches are
several: Terra separates the execution into different virtual machines and thus, the sharing of
data between several VM’s may become more difficult and less intuitive. This problem does not
exist is the Alcatraz system where the user can decide whether to commit the changes. How-
ever, if the user is to successfully identify malicious behavior he will need to be knowledgeable
about the program’s behavior. In other words, the user will need to be an expert.
3.4 Expert Systems
Expert systems such as CLIPS  , are designed to model human expertise and knowledge.
They can be used to develop systems for diagnostics or consultation, and can eliminate the
need for a human expert. Automatic intrusion detection systems can also benefit from an
expert system tool. A case for using expert systems for emulating human security experts is
presented by A. Chesla in .
Human security expert are very adaptive when analyzing new attacks and intrusions. An
expert system will therefore need to be adaptive learn new security exploits.
coonkasem et. al.  show how a neural network can be used to allow the expert system to
learn from experience, and effectively improve the expert system.
Enhancing Intrusion Detection Systems with Expert Systems can improve the quality of the
intrusions detected, i.e. it will be possible to detect more complex patterns which are currently
detected by human security experts. In addition, the ability to model human knowledge may
reduce the number of false positives as well as give advice to the non-expert user.
In this section we introduce our security policy. Based on the unique behavior of Trojan Horses
and Backdoors described in section 2.2 we develop rules that determine which patterns are
malicious. Our policy is composed of a set of rules, where each rule is designed to detect
different type of malicious behavior.Our rules are designed to oversee different types of
program behavior. We classify our rules into three categories where each category groups
similar malicious execution patterns. Our categories include:
• Execution flow.
• Resource abuse.
• Information flow.
Each rule has an associated severity label which is assigned according to our confidence
that the behavior detected is actually malicious. The label is intended as a additional guide to
the user when he makes his decision to continue or kill the application. We distinguish three
severity levels: Low, Medium and High. Low severity is used where our confidence that this
code is malicious is low, Medium severity when our confidence is higher and High when we are
most confident the code is malicious.
In the next sections, we give examples of our rules in each of the different categories. We
do not present all the rules implemented but rather a representative set of rules.
4.1 Execution Flow
In our policy we monitor the execution flow of the program, which includes the invocation and
execution of new processes. Our target is to detect malicious code being executed. It is more
likely that a program is malicious if it (the program) is executing processes with hardcoded
names present in the binary or if the process names originate from a socket (potentially a
remote attacker). Our policy implements the following rules:
1. A rule that verifies the name of a newly created process is not hardcoded:
if (new process) and (process name is hardcoded) then
Warn user (Low);
2. A rule that verifies that the name of a new process does not originate from a socket1:
if (new process) and (process name originated from a socket) then
Warn user (High);
3. A rule that verifies the name of a newly created process is not hardcoded, and that this
code is infrequent:
if (new process) and (process name is hardcoded) and
(code frequency is low) and (program started a while ago) then
Warn user (Medium);
Our policy takes into account how often code is being executed. If a code segment is rarely
executed it may reinforce the suspicion of the presence of malicious code. Rarely can be defined
as once during an execution or once across multiple executions. For example, malicious code
such as the CIH/Chernobyl Virus execute on predefined dates in the year (CERT IN-99-03)
. In our policy, we increase the severity level from Low to Medium when a program name
is hardcoded and this program is being executed rarely.
Our policy takes into account the origin of the program name. When the program name
is hardcoded, we have less confidence in our warning since this may also occurs in trusted
programs, therefore the severity level is Low. On the other hand, when the program name
originates from a socket, we have more confidence that the program is malicious and label the
In the next section we present rules to counter Resource abuse.
1In our policy we assign a high warning to process names that originate from a socket. Future implementations
may check if the socket name was hardcoded or provided by the user. This distinction is already made by other rules
in our policy.
Resource abuse includes allocating and using different resources from the operating system with
the purpose of draining the OS resources and impacting performance. Examples of resource
• Executing numerous new processes
• Allocating a large amount of memory (such as the malicious code Trojan.Vundo does, as
In our policy we monitor the number of new processes created, as well as the rate of creation
of new processes. Our policy implements the following rules:
1. A rule that tracks the number of a newly created processes:
if (new process was created) and (number of new processes created is high) then
Warn user (Low);
2. A rule that tracks the rate of newly created processes:
if (new process was created) and (the rate of new created processes is high) then
Warn user (Medium);
In our policy, we monitor the creation and execution of new processes.
confident when the rate of the creation of new process is high and therefore assign it a higher
severity label. We leave other types of resource management or resource abuse (e.g., memory
allocation) to future implementations of our system.
In the next section we present rules that monitor the information flow.
We are more
Information flow includes the flow of information between the following different sources and
• The user input (information source only).
• OS files and sockets (information source and target).
• The program binary (information source).
• The hardware (information source2).
We elaborate on the different data sources in section 5.1.
Next, we present several rules implemented in our policy:
2In current implementations, hardware is only a source of information, future implementation may include in-
structions that can write information to the hardware.
1. A set of rules that alerts the user when information is flowing from a file to a socket
(Note, that the both the file name and the socket address May be hardcoded or given by
if (information source is a file) and (information target is a socket) then
if (user gave file name) and (hardcoded socket address) then
Warn user (Low);
if (hardcoded file name) and (user gave socket address) then
Warn user (Low);
if (hardcoded file name) and (hardcoded socket address) then
Warn user (High);
2. A rule that notifies the user when information is flowing from hardware to a hardcoded
if (information source is hardware) and (information target is a file) and
(file name is hardcoded) then
Warn user (High);
In our policy, several more rules are implemented. Those rules are very similar to the ones
presented above, their sources and/or targets are different and are therefore not presented
To be able to implement the policy described above and identify the different types of malicious
behavior, we need to track and analyze dynamic program behavior. We separate the analysis
and policy implementation from the tracking mechanism to allow for a flexible and independent
development of each component. Figure 1 shows the HTH high-level software architecture. In
sections 6 and 7 we describe the design and implementation of each component.
Next, we elaborate on the different data sources that need to be tracked for information
flow. Then we discuss the design space for our implementation of the tracking mechanism.
Program Monitoring &
Figure 1: HTH software architecture
Our policy rules take into account the source of the data. We maintain enough information
about each data source to enable our policy to make fine-grained distinctions. For example,
we maintain more than just one ’Taint’ bit. A single Taint bit only allows us to distinguish
between two different data sources For example, a single bit can indicate whether the data
was input to the program or not [24, 36, 23]), or if the data was defined or undefined .
We are interested in maintaining additional information. In particular, we would like to
know the type and name of each resource. The following resource types (for data sources) are
defined to support our policy:
• USER INPUT
The USER INPUT, FILE and SOCKET data sources types are self explanatory. The BINARY
data source type is used to find hardcoded values. When the program itself or shared libraries
are being loaded, the corresponding memory addresses are tagged BINARY.
The HARDWARE data source is used to tag data that originated from hardware. An
example of this is the X86 cpuid instruction which stores processor identification information
in the %EAX, %EBX, %ECX and %EDX registers. Although this is a simple example, future
processors may hold more information in hardware. This information may include user secrets,
hardware secrets, and information used for auditing.
We have several reasons for tracing the resource name:
1. it allows us to specify trusted resources (for example, trusted libraries such as libc.so),
2. we are able to give the user more information about the source or target of the information
3. we are able to use the name during debugging3.
The data sources are used for identifying the source of data. They are also used to identify
the source of function arguments, such as strings or numbers. In particular, they can be used
to identify the source of a resource name or address. For example, if a file is opened and the
file name was hardcoded, the data source of the file name (the string itself) will be BINARY. If
3Note, for the security policy, it is only necessary to track the data source type and a minimal set of trusted
the file name was given by the user, the data source would be USER INPUT. In the rest of this
report we will use resource ID (identifier) or origin to denote the resource file name or socket
address, and use resource ID (origin) data source to denote the data source corresponding to
the resource ID (origin).
Table 2 shows all the possible combinations of data sources and the resource ID (origin)
Resource ID (Origin) Data Source
SOCKETSocket name (address)
Table 2: Data source combinations.
5.2Design consideration and tradeoffs
In this section we motivate our design choices. We first describe several design alternatives and
explain the tradeoffs associated with them. Next we examine which events we need to track to
collect all the relevant information needed to support our policy and clarify the implications of
tracking all those events. We end with presenting the design choices we made and the reasons
for those choices.
5.2.1Static vs. runtime behavior tracking
A program can be analyzed statically or dynamically. Static analysis is performed at compile
time, link time or post link time. Static analysis does not introduce any runtime overhead. Run
time tracking on the other hand may impose significant overhead, but furnishes the monitor
with all runtime information not available statically.
Accurate runtime information can provide more information to the analysis and may lead
to a more accurate policy. A static tool may not be able to discover all the code being executed,
and thus, limit the effectiveness of the policy5. For example, dynamically linked libraries are
only loaded at runtime and may not be available prior to execution. Another example, is
self-modifying code which cannot be analyzed statically.
4Our prototype, as well as any incomplete implementation may need to consider an UNKNOWN data source as
5If all dynamically linked libraries are trusted, this may be less of an issue.
Code execution profiles, as well as real time data from the user or the network (which can
be monitored and analyzed), are only available at runtime.
A run time system, although slower, has a significant advantage of having all the run time
data available to it. Having such rich data set may lead to a more accurate analysis and reduce
the number of false positives.
5.2.2 Source code vs. binary
Analyzing a program for vulnerabilities can be done using several different methods. We can
choose to analyze source code or binary code. One key difference is that source code analysis
has the advantage of maintaining the high-level semantics of the program behavior. Binary
analysis introduces a semantic gap between the low-level behavior that can be observed (e.g.,
assembly instructions) and the high-level behavior the program exhibits (e.g., method calls).
Since the source code maintains high level semantics, analyzing it to discover the program
behavior may be more accurate. The main drawback however, is that the source code needs
to be available. This is usually not the case.
5.2.3Events monitored at different abstraction levels
There are numerous events that need to be monitored to accommodate the information we
need for our policy. We divide the events into 3 categories:
• Architectural (ISA) events - (instructions executed),
• OS (API) events - system calls, and
• Library (API) events - library routines
Events such as OS and Library calls allow us to collect information related to the program
semantics and program information flow. Architectural events allow us to collect information
related to program information flow and program frequency.
Dividing these events into categories emphasizes the need to accommodate different levels
of abstraction in our system.
HTH is a runtime monitoring system. This will allow us to maintain runtime information and
enable us to perform detailed and accurate behavior analysis. Future research will look into
developing hybrid approaches in which static analysis may be used to accelerate the runtime
In this initial implementation, our goal was to keep the monitor as lean, generic, and
general as possible. Analyzing source code would limit our analysis to one particular language
or, would require a specialized front-end for each language. Moreover, the source code may
frequently be unavailable. Therefore HTH analyzes the program binary. This alternative does
tie us to a specific architecture and OS (an executable format may bound us to a specific OS),
though we have started to think about how to design this for portability.
Even though we could potentially track all events at every abstraction level (architectural,
OS and library), we will benefit if we can reduce the number of events monitored to a generic
and preferably small number of events. Tracking all the libraries API may very well be in-
tractable. Since HTH analyzes the program binary and has no access to the source code,
we are only able to monitor calls that are made to shared objects. We do not assume that
debug information is available in the binary and thus restrict ourselves to shared objects with
a defined API can be monitored.
HTH will monitor architectural and OS events, as well as track selected library calls. The
main reason for tracking a subset of library calls is to overcome the semantic gap introduced
by working with only architectural and OS events.
5.3.1 HTH classification
HTH falls into the class of program monitors called white box detectors, as defined by Gao et
al. . A white box detector is a system that uses all information available to it including:
• system calls and system call arguments,
• program memory usage and,
• source code and binary code.
HTH is chosen to be a run time security monitor. We believe that this alternative is the
most flexible and applicable for most users. In addition we believe that run time analysis
can provide the best accuracy and minimize the number of false positives. Future work will
consider the run time implications of such a system.
In the next sections we pesent Secpert the expert system which is responsible for analyzing
the program behavior and Harrier which is the run time monitor which tracks all the events.
6Secpert Design and Implementation
Secpert (Security expert) is the HTH component responsible for analyzing program behavior
and implementing the policy. It is implemented as an expert system. In figure 2 we show a
high level view of Secpert Expert system architecture.
Secpert is driven by program events. The events are used to analyze a program’s behavior.
Based on the policy adopted, the runtime behavior is monitored and a warning will be issued
to the user if an exploit is detected.
We first describe the events Secpert will address and then describe our implementation
using the CLIPS expert system.
6.1 Secpert Events
To implement the policy we have described early, Secpert is notified whenever an event occurs.
To keep Secpert running efficiently, we only notify Secpert on predefined events and attach all
relevant information to those event.