Analysing part-of-speech for Portuguese text classification
This paper proposes and evaluates the use of linguistic in- formation in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Sup- port Vector Machines (...
Autor principal: | |
---|---|
Outros Autores: | |
Formato: | article |
Idioma: | eng |
Publicado em: |
2011
|
Assuntos: | |
Texto completo: | http://hdl.handle.net/10174/2565 |
País: | Portugal |
Oai: | oai:dspace.uevora.pt:10174/2565 |
Resumo: | This paper proposes and evaluates the use of linguistic in- formation in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Sup- port Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de So Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong re- duction of the number of features needed in the text classification. |
---|