SAINT: A Security Analysis Integration Tool
Diego M. Zamboni
Computer Security Area
Direccio ´n General de Servicios de Co ´mputo Acade ´mico
Universidad Nacional Auto ´noma de Me ´xico
Apdo. Postal 20-059, 01000 Me ´xico D.F., Me ´xico
of Me ´xico that will allow integrated analysis of information gathered from various sources, such as secu-
rity tools and system logs. By simulating events occurring in the systems, and collected from the different
sources, SAINT will allow detection, or even prevention of problemsthat may otherwisego undetected due
to lack of informationaboutthem in any singleplace. SAINT’smodular and extensiblearchitecturemake it
feasible to add new modules for processing new data types, detecting new kinds of problems, or presenting
the results in different formats.
1Introduction — The Problem
the use of various security tools has been promoted as one of many ways of increasing Unix system security.
Until now, only freely available tools have been used, mainly because they cover most of the needs in this
particular academic and research environment.
The main set of tools used consists of COPS [FS90], TCP-Wrappers [Ven92], Passwd+ [Bis95], Crack
[Muf], TripWire [KS93, KS94a, KS94b] and SATAN [FV], although other tools (like Tiger [SSH93], S/Key
[Hal94, HA94] and the logdaemon suite [Ven]) are also used.
Experience has shown that, when need arises to diagnose a problem, the solution often comes after col-
lecting information from more than one source, including, but not restricted to, the tools mentioned above.
For example, to trace a suspicious su access to root, it may be necessary to match a wtmp record with a
sulog entry. To further trace it back to its origins, it may be necessary to match the wtmp record with a
TCP-Wrappers log entry, go to other systems and repeat the log analyzing and matching until all the needed
data is collected. The information is available, in many cases, but it is scattered all over several systems and
in different formats, and it has to be collected and analyzed manually to get something more useful than just
a collection of facts.
Therefore, the problem can be summarized in the following points:
To achieve acceptable levels of security, it is necessary—among many other things, of course— to use
several different tools, each one of them working in something specific (and, many times, even dupli-
Each one of these tools generates data on its own, and in different formats.
To have a more complete view of what is happening, the system and/or security administrator has to
read several reports and logs generated by the tools, often over a period of time.
The correlations and matching between related items in the different logs has to be done manually by
?Originally published in the Proceedings of the 1996 SANS (System Administration, Networking and Security) Conference, Wash-
ington D. C., May 12–18, 1996.
In Mexico (and other non-English speaking countries, for sure), the fact that all the generated informa-
tion is in English poses yet another problem. Although English is the lingua franca in computing, it is
still a barrier for people (including many Unix system administrators) using computers in Me ´xico.
This can, and does, lead to mis-utilization of the tools, which just sit there collecting mountains of data
that nobody never uses. Recently, some tools have been released that allow easier viewing of generated data
(most notably CIAC’s Merlin[CIA]), but the problemstill remains of making an understandablewhole of the
seemingly chaotic set of reports and log files.
That is why SAINT’s idea was born: to make a system that allows integrated analysis of data collected
from various sources, and tries to extract interesting information to be presented to the administrator in an
easy to read format.
ThispaperpresentsthedesignofSAINT,which isstillunderdevelopmentatUNAM’s ComputerSecurity
Log file analysis is not new. In fact, it has been used for many years. In the simpler end, there are tools like
searching for certain patterns and doing something when they are found. These tools are useful for looking
for very specific things, but since the search they perform is essentially stateless, their usefulness is restricted
to looking for specific things that may indicate problems.
HCMM92], which uses a rule-basedlanguage (called RUSSEL) to process audit trails generatedby a number
of systems. In a distributed environment, ASAX runs local “evaluator” processes on each monitored host,
which submit their local results to a master server, which in turn processes the consolidated data. Although
the modelisgeneralenoughtobe portedtoanytypeofsystem,the currentimplementationisorientedtowards
SunOS 4.1 with C2 security features, and uses PVM [GBD
ASAX is a very powerful package, and its rule-base analysis makes it able to detect complex event se-
quences that may indicate problems. However, its same complexity makes it difficult to use in a very het-
erogeneous environment like ours. The recommended (C2) audit mechanisms are not in place in most of our
systems, and compiling ASAX in very different versions of Unix proved difficult.
94] as the communication mechanism between
3What is SAINT?
SAINT provides the framework for performing the following functions:
1. Cross-analysisof reportsand logs generatedby various security tools, as well as systemlogs, in several
Unix systems. The goal is trying to detect things (or sequences or patterns of things) that may indicate
problems of any kind.
2. If it is possible, obtain information about likely causes of detected problems.
3. Warning generation when appropriate (the most clear case would be when a flagrant security problem
is detected, but there are many other situations where opportune notifications are very useful).
4. If possible, suggest available solutions to detected problems.
5. In its first version, presentation of all the results in Spanish.
The main goals when designing SAINT were:
Make it extensible. It should be easy to add new modules to the system, to make it aware of new kinds of
available information (for example, a new tool), or to modify or improve its analysis capabilities.
More information about UNAM’s Computer Security Area can be found at http://www.super.unam.mx/asc/
Make it configurable. Securityisnotthe samefor everyone,andthe usershouldbeable tospecify, inamore
or less detailed fashion, what is important for him or her, and what is not.
Make it easy to use. OneofthesecuritytoolcursesthatSAINTistryingtoavoidis“youneedtobeanexpert
to use this”. Once it is in place (and it shouldn’t be too difficult to do that too), it should be easy to use
and to review the generated results.
Make it portable. If it is going to succeed, SAINT must be usable in heterogeneous environments with as
little changes as possible. Building on the experience of other tools like COPS, SATAN and Merlin,
as well as on the experience of the people working on it, most of SAINT (if not all) is being written in
Perl5 [WS92, Wal] (why not admit it, SATAN also inspired the name).
4What is SAINT not?
SAINT does not try to be a full-featured Intrusion Detection System (IDS), although SAINT reports can be
usedtodetectintrusions. IDStechnologyisbynowfaraheadofSAINT’sdesign. Advancedworkonthistopic
is being done, among others, by Crosbie and Spafford [CS95a, CS95b], Kumar and Spafford [KS94c, KS95]
and Kumar [Kum95].
SAINT is intended just as an information analysis tool. This point made, let’s proceed with SAINT’s de-
5What does SAINT do?
SAINT’s operation can be divided in four big phases:
1. Data collection and homogenization.
2. Event sorting.
3. Event analysis.
4. Results presentation.
Each of this phases is described in detail next.
6 Data collection and homogenization
SAINTneeds toprocessdata producedby varioustoolsand, possibly,variouscomputer systems. To facilitate
further processing, the first stage in SAINT’s execution consists in getting the data from all the sources that
will be used, and converting it to a common data format that makes it easier for the next stages to process.
This task presents the following problems:
1. The information produced by each tool comes from different places, and it is in different formats. The
process of converting to the common data format is different for each data type.
2. Data formats used by some tools may change in future versions of the same tool.
3. It must be possible to add support for new tools (or new versions of existing tools) without modifying
4. The common data format must contain all the information that may be useful for processing.
For the first 3 points, the given solution is using different program modules for each tool. These mod-
ules are independent programs that are used by SAINT (according to its configuration file) for getting and
processing each tool’s data.
This gives the following advantages:
Each module is specialized in only one data type, thus it may be very simple and easy to test.
Each module can be written in the language that is most appropriate for the kind of processing to do, as
long as it complies with SAINT’s interface specifications.
Eachmodule canget the datafromwhereverit isappropriate. Itmay get it fromafile, fromthe network
or from another program. This process is transparent to SAINT, sinceit is performedinternallyby each
Modules can be easily added, replaced or modified to deal with new tools or new versions of existing
tools, to correct problems or to improve processing efficiency.
The design of the common data format had to meet the following goals:
Completeness: It must include all the information it gets, only in another format. Its output must be in the
form of “events”, that is, discrete pieces of information as concise and simple as possible, to simplify
in the format.
Simplicity: It must be easily processable by a program. This implies that the fields must be easily distin-
guishable, and that the contents of each field must belong, as far as possible, to a finite and predictable
The common data format is composed by lines of text representing events, and each line includes the fol-
Event type: A keyword that allows classification of the event. These are assigned arbitrarily, according to
the events the system must recognize. Some of the event types currently recognized are:
Most of these types are self-explanatoryand need no further discussion. Perhaps it is worth explaining
su_root, su_user and port_connect. The first refers to an su command executed to get root access, and
the second refers to the same command used to get access to a non-root account. These events were
separated since su’s to root normally require a much stricter analysis.
Port_connect representa connectionto any TCP/IPport in the system. This canbe useful, for example,
to detect port scans, failed login attempts, etc.
new event types, as long as they deal correctly with them.
Event date and time: Since the analysisperformed will be chronological,it is important that each event has
a time stamp, whereverpossible. If the original datadoesn’tinclude dateand time, one oftwo solutions
may be provided:
1. The corresponding module may be able to estimate the time when the event happened (for exam-
ple, by interpolating from other known data). This is the preferred solution.
2. Leave the field empty, in which case the event will be considered as having happened before all
Event source system: When the event involves an access through the network, this field identifies the orig-
Event destination system: The destination system for network events.
Event source user: Forsomeeventsit ispossibletofindoutwhich usergeneratedit, andthe informationcan
be useful in the analysis.
Event destination user: Itmayalsobeusefultoknowwhichuser“received”theevent,forexample,atelnet
session or an su.
General purpose field: It was considered appropriate to have a field whose meaning depends on the event
that is being registered. Some examples of what this field may contain are:
Process ID for the daemon providing a service.
Flag indicating success or failure of an operation.
Relevant environment variables.
File name in an FTP transfer.
Command executed by rsh.
Terminal name for an interactive session.
Reason for reboot.
Original message: Itisimpossibletoforeseeandclassify allthe informationthat,atanygiven moment,may
data format the original message from which the event was detected. This will allow to performon this
should be substituted by their common escape-character sequences (for example, \n for a newline or
\t for a tab) or their ASCII code representation in octal (for example, \014).
The format used for each record in the common data format is as follows:
Where the fields appear in the order they were previously mentioned. The vertical bar character (|) was
chosen because it is rare in most of the reports we have seen. In case it appears in a field, it should be escaped
(\|) before storing it.
7 Event sorting
The analysis performed on the events will be necessarily chronological, since this facilitates the detection of
data format, they will be sorted according to their time stamps.
Events whose second field (date and time) is empty should be left at the beginning of the list, in the order
they appeared in the event collection stage.
It isimportant tomention a possibleproblem here: ifthe clocksin the systemsbeinganalyzed are not cor-
rectly synchronized(more than a fewseconds difference), this stagecouldprobably leaveeventsina different
order than they originated. This could affect the analysis results.
8 Event analysis
This is the stage that does SAINT’s central work. Based on the data collected and processed in the previous
stages, an analysis must be done that allows detection of relationships among events and trying to decide if
these events are part of the systems’ normal operation or represent possible problems.
Two approaches were considered for this analysis in the design: parsing and event simulation.