An empirical study on anomaly detection algorithms for extremely imbalanced datasets

Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within th...

ver descrição completa

Detalhes bibliográficos
Autor principal: Fontes, Gonçalo (author)
Outros Autores: Matos, Luís Miguel (author), Matta, Arthur (author), Pilastri, André Luiz (author), Cortez, Paulo (author)
Formato: conferencePaper
Idioma:eng
Publicado em: 2022
Assuntos:
Texto completo:https://hdl.handle.net/1822/81441
País:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/81441
Descrição
Resumo:Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.