Using IR techniques to improve Automated Text Classification

This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets...

ver descrição completa

Detalhes bibliográficos
Autor principal:	Gonçalves, Teresa (author)
Outros Autores:	Quaresma, Paulo (author)
Formato:	article
Idioma:	eng
Publicado em:	2011
Assuntos:	machine learning Text classification
Texto completo:	http://hdl.handle.net/10174/2557
País:	Portugal
Oai:	oai:dspace.uevora.pt:10174/2557

Descrição
Resumo:	This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).

Using IR techniques to improve Automated Text Classification

Registros relacionados

Precisa de ajuda?