Using a genetic algorithm to optimize a stacking ensemble in data streaming scenarios

The requirements of Machine Learning applications are changing rapidly. Machine Learning models need to deal with increasing volumes of data, and need to do so quicker as responses are expected more than ever in real-time. Plus, sources of data are becoming more and more dynamic, with patterns that...

Full description

Bibliographic Details
Main Author: Ramos, Diogo (author)
Other Authors: Carneiro, Davide (author), Novais, Paulo (author)
Format: article
Language:eng
Published: 2020
Subjects:
Online Access:http://hdl.handle.net/1822/68102
Country:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/68102
Description
Summary:The requirements of Machine Learning applications are changing rapidly. Machine Learning models need to deal with increasing volumes of data, and need to do so quicker as responses are expected more than ever in real-time. Plus, sources of data are becoming more and more dynamic, with patterns that change more frequently. This calls for new approaches and algorithms, that are able to efficiently deal with these challenges. In this paper we propose the use of a Genetic Algorithm to Optimize a Stacking Ensemble specifically developed for streaming scenarios. A pool of solutions is maintained in which each solution represents a distribution of weights in the ensemble. The Genetic Algorithm continuously optimizes these weights to minimize the cost function. Moreover, new models are added at regular intervals, trained on more recent data. These models eventually replace older and less accurate ones, making the ensemble adapt continuously do changes in the distribution of the data.