Analysing part-of-speech for Portuguese text classification

This paper proposes and evaluates the use of linguistic in- formation in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Sup- port Vector Machines (...

ver descrição completa

Detalhes bibliográficos
Autor principal:	Gonçalves, Teresa (author)
Outros Autores:	Quaresma, Paulo (author)
Formato:	article
Idioma:	eng
Publicado em:	2011
Assuntos:	machine learning Text classification
Texto completo:	http://hdl.handle.net/10174/2565
País:	Portugal
Oai:	oai:dspace.uevora.pt:10174/2565

Descrição
Resumo:	This paper proposes and evaluates the use of linguistic in- formation in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Sup- port Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de So Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong re- duction of the number of features needed in the text classification.

Analysing part-of-speech for Portuguese text classification

Registros relacionados

Precisa de ajuda?