Using Stacked Generalization for Anomaly Detection

Anomaly Detection is an important research topic nowadays, in which the intention is to find patterns in data that do not not conform to expected behavior. This concept is applicable in a large number of different domains and contexts, such as intrusion detection, fraud detection, medical research a...

Full description

Bibliographic Details
Main Author: Miguel Oliveira Sandim (author)
Format: masterThesis
Language:eng
Published: 2017
Subjects:
Online Access:https://repositorio-aberto.up.pt/handle/10216/112506
Country:Portugal
Oai:oai:repositorio-aberto.up.pt:10216/112506
Description
Summary:Anomaly Detection is an important research topic nowadays, in which the intention is to find patterns in data that do not not conform to expected behavior. This concept is applicable in a large number of different domains and contexts, such as intrusion detection, fraud detection, medical research and social network analysis.Techniques that have been addressed within this topic are diverse, based on different assumptions about how anomalies manifest themselves within the data and can have different outputs (i.e. a numeric score or a labeled classification).Because of this heterogeneity, every technique is specialized in specific characteristics of the data and may only provide a limited insight on what anomalies exist in a given dataset.Ensemble Learning is process that tries to incorporate the opinions of different learners in order to make a more pondered decision.This process has been successfully applied in the past to supervised learning problems and improvements in performance have been empirically observed.Stacked Generalization is one of these methods, in which a learning algorithm is used to combine the different learners.The intention of this thesis is to research the application of Stacked Generalization to current state-of-the-art Anomaly Detection techniques and determine if this method can lead to a better overall performance.These methods will then be evaluated on well-known publicly available datasets used for benchmarking throughout the literature in Anomaly Detection.