An unsupervised approach to feature discretization and selection

Many learning problems require handling high dimensional data sets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words represen...

ver descrição completa

Detalhes bibliográficos
Autor principal: J. Ferreira, Artur (author)
Outros Autores: Figueiredo, Mário A. T. (author)
Formato: article
Idioma:eng
Publicado em: 2018
Assuntos:
Texto completo:http://hdl.handle.net/10400.21/8569
País:Portugal
Oai:oai:repositorio.ipl.pt:10400.21/8569
Descrição
Resumo:Many learning problems require handling high dimensional data sets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevante (oreven detrimental) for the learning tasks. It ist hus clear that the reisaneed for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for médium and high-dimensional datas ets. The experimental results on several standard data sets, with both sparse and dense features, showthe efficiency of the proposed techniques as well as improvements over previous related techniques.