An empirical study on anomaly detection algorithms for extremely imbalanced datasets

Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within th...

Full description

Bibliographic Details
Main Author: Fontes, Gonçalo (author)
Other Authors: Matos, Luís Miguel (author), Matta, Arthur (author), Pilastri, André Luiz (author), Cortez, Paulo (author)
Format: conferencePaper
Language:eng
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/1822/81441
Country:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/81441
Description
Summary:Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.