Cyber Defense Laboratory



TIAA: A Toolkit for Intrusion Alert Analysis
(Version 0.4)

What's New

We have added three new utilities into TIAA in version 0.4:
  • Association Analysis (Extracting frequent coourrences of attribute values from a set of alerts)
  • Attack Strategy Extraction (Extracting attack strategies from a correlation graph)
  • Missed Attack Hypotheses (Hypothesizing possibly missed attacks)

Supported Platforms

  • This tool has been tested on Windows XP with MS SQL server 2000.


Test Data and Execution Procedure

  1. Download this sql statement and execute it in the database to create a target table "events";
  2. Download  any of these alert datasets [scenario 1 (dmz, inside), scenario 2 (dmz, inside)] generated by RealSecure, the data source is DARPA evaluation dataset 2000. You can import it into MS SQL server with the command "bulk insert events from 'file path' with (FIELDTERMINATOR=',');";
  3. Download the sample knowledge base XML file and its schema;
  4. Download the sample property file;
  5. For aggregation analysis and attack strategy extraction, download a sample abstraction hierarchy file here;
  6. For missed attack hypotheses, you can use Ethereal to analyze the tcpdump file and save the analysis result (packet summary) into a text file. This information can be used to prune some incorrectly hypothesized attacks. For your convenience, you can also download sample analysis results here. (You need to unzip the file.)

Checklist before You Run This Tool

  • Java 1.4 or above
  • MS SQL Server 2000 and JDBC driver
  • Xerces Java Parser v1.4.4 (You can get it from Apache's website)
  • GraphViz (You can get it from AT&T's website)
  • Ethereal (not necessary if you have downloaded our sample analysis results) (You can get Ethereal from Ethereal website

Main Contributor for Ver 0.4


TIAA Ver 0.4 is based on TIAA Ver 0.3. Yun Cui, Yiquan Hu, Pai Peng have contributed to the earlier versions of TIAA.

Three New Utilities

Association Analysis

Association analysis can help us find the frequent co-occurrences of attribute values belonging to different attributes that represent various alerts. For example, through association analysis, we may find many attacks are from source IP address to destination IP address at destination port 80.

When performing association analysis, there are two input parameters. The first is the set of alerts (represented by the alert collection ID), and the second is the support threshold. Given a set S of alerts and a support threshold t%, a frequent attribute set A1=a1 ^ A2=a2 ^ ... ^ An=an (A1, A2, ..., An are attribute names, and a1, a2, ..., an are attribute values) denotes that there are at least t% of the alerts, where their attribute values satisfy A1=a1 ^ A2=a2 ^ ... ^ An=an. A sample result of association analysis is as follows.

Frequent Attribute SetsSupport
HyperAlertType=Rsh ^ DestPort=51438.63636363636363%
HyperAlertType=Sadmind_Amslverify_Overflow ^ SrcIPAddress=

Attack Strategy Extraction

Attack strategy is represented by a directed graph, where each node is a hyper-alert type, and each directed edge represents the equality constraints the related nodes should satisfy. Two sample attack strategy graphs are as follows.

An attack strategy graph without abstraction hierarchy (click and expand the graph)

An attack strategy graph with abstraction hierarchy (click and expand the graph)

 Missed Attack Hypotheses

The utility of missed attack hypotheses integrates a set of alert collections and hypothesizes possibly missed attacks. 

The GUI part related to the attack hypotheses sets important parameter for this utility. When integrating a set of alert collections, the collection IDs are separated by ",". The packet summary information is imported into the database before it can be used to filter out incorrect hypotheses. Since it may take pretty long time to import all packet information into the database, we may save these information for later analysis instead of importing them every time. For the replayed traffic, we also need to input replay time. If using our sample datasets, the replay time for inside1 is "2001-11-10 03:59:55", and the replay time for dmz1 is "2001-11-09 23:06:18". For detailed information on how to use this utility, please refer to Installation and Operation Manual. A sample integrated correlation graph through performing  missed attack hypotheses is as follows (the gray nodes are hypothesized alerts).

An integrated correlation graph (click and expand the graph)


Copyright 2004 North Carolina State University. All rights reserved.  Last Updated March 16, 2006 .