Is linguistic information relevant for the classification of legal texts?

Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learn...

ver descrição completa

Detalhes bibliográficos
Autor principal: Gonçalves, Teresa (author)
Outros Autores: Quaresma, Paulo (author)
Formato: article
Idioma:eng
Publicado em: 2011
Assuntos:
Texto completo:http://hdl.handle.net/10174/2561
País:Portugal
Oai:oai:dspace.uevora.pt:10174/2561
Descrição
Resumo:Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.