On the computation of maximal-correlated cuboids cells

The main idea of iceberg data cubing methods relies on optimization techniques for computing only the cuboids cells above certain minimum support threshold. Even using such approach the curse of dimensionality remains, given the large number of cuboids to compute, which produces, as we know, huge ou...

ver descrição completa

Detalhes bibliográficos
Autor principal: Alves, Ronnie Cley Oliveira (author)
Outros Autores: Belo, Orlando (author)
Formato: conferencePaper
Idioma:eng
Publicado em: 2006
Assuntos:
Texto completo:http://hdl.handle.net/1822/71932
País:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/71932
Descrição
Resumo:The main idea of iceberg data cubing methods relies on optimization techniques for computing only the cuboids cells above certain minimum support threshold. Even using such approach the curse of dimensionality remains, given the large number of cuboids to compute, which produces, as we know, huge outputs. However, more recently, some efforts have been done on computing only closed cuboids. Nevertheless, for some of the dense databases, which are considered in this paper, even the set of all closed cuboids will be too large. An alternative would be to compute only the maximal cuboids. However, a pure maximal approaching implies loosing some information, this is one can generate the complete set of cuboids cells from its maximal but without their respective aggregation value. To play with some "loss of information" we need to add an interesting measure, that we call the correlated value of a cuboid cell. In this paper, we propose a new notion for reducing cuboids aggregation by means of computing only the maximal-correlated cuboids cells, and present the M3C-Cubing algorithm that brings out those cuboids. Our evaluation study shows that the method followed is a promising candidate for scalable data cubing, reducing the number of cuboids by at least an order of magnitude or more in comparison with that of closed ones.