Intrusion detection and traffic classification using application-aware traffic profiles

Along with the ever-growing number of applications and end-users, online network attacks and advanced generations of malware have continuously proliferated. Many studies have addressed the issue of intrusion detection by inspecting aggregated network traffic with no knowledge of the responsible appl...

Full description

Bibliographic Details
Main Author: Alizadeh, Hassan (author)
Format: doctoralThesis
Language:eng
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10773/23545
Country:Portugal
Oai:oai:ria.ua.pt:10773/23545
Description
Summary:Along with the ever-growing number of applications and end-users, online network attacks and advanced generations of malware have continuously proliferated. Many studies have addressed the issue of intrusion detection by inspecting aggregated network traffic with no knowledge of the responsible applications/services. Such systems may detect abnormal tra c, but fail to detect intrusions in applications whenever their abnormal traffic ts into the network normality profiles. Moreover, they cannot identify intrusion-infected applications responsible for the abnormal traffic. This work addresses the detection of intrusions in applications when their traffic exhibits anomalies. To do so, we need to: (1) bind traffic to applications; (2) have per-application traffic profiles; and (3) detect deviations from profiles given a set of traffic samples. The first requirement has been addressed in our previous works. Assuming that such binding is available, this thesis' work addresses the last two topics in the detection of abnormal traffic and thereby identify its source (possibly malware-infected) application. Applications' traffic profiles are not a new concept, since researchers in the field of Traffic Identification and Classification (TIC) make use of them as a baseline of their systems to identify and categorize traffic samples by application (types-of-interest). But they do not seem to have received much attention in the scope of intrusion detection systems (IDS). We first provide a survey on TIC strategies, within a taxonomy framework, focusing on how the referred TIC techniques could help us for building application's traffic profiles. As a result of this study, we found that most TIC methodologies are based on some statistical (well-known) assumptions extracted from different traffic sources and make the use of machine learning techniques in order to build models (profiles) for recognition of either application types-of-interest or application-layer protocols. Moreover, the literature of traffic classification observed some traffic sources (e.g. first few packets of ows and multiple sub- ows) that do not seem to have received much attention in the scope of IDS research. An IDS can take advantage of such traffic sources in order to provide timely detection of intrusions before they propagate their infected traffic. First, we utilize conventional Gaussian Mixture Models (GMMs) to build per-application profiles. No prior information on data distribution of each application is available. Despite the improvement in performance, stability in high-dimensional data and calibrating a proper threshold for intrusion detection are still main concern. Therefore, we improve the framework restoring universal background model (UBM) to robustly learn application specific models. The proposed anomaly detection systems are based on class-specific and global thresholding mechanisms, where a threshold is set at Equal Error Rate (EER) operating point to determine whether a ow claimed by an application is genuine. Our proposed modelling approaches can also be used in a traffic classification scenario, where the aim is to assign each specific ow to an application (type-of-interest). We also investigate the suitability of the proposed approaches with just a few, initial packets from a traffic ow, in order to provide a more eficient and timely detection system. Several tests are conducted on multiple public datasets collected from real networks. In the numerous experiments that are reported, the evidence of the efectiveness of the proposed approaches are provided.