Open Resources and Tools for the Shallow Processing of Portuguese: the TagShare project

This paper presents the TagShare project and the linguistic resources and tools for the shallow processing of Portuguese developed in its scope. These resources include a 1 million token corpus that has been accurately hand annotated with a variety of linguistic information, as well as several state...

ver descrição completa

Detalhes bibliográficos
Autor principal: Barreto, Florbela (author)
Outros Autores: Branco, António (author), Ferreira, Eduardo (author), Mendes, Amália (author), Bacelar do Nascimento, Maria Fernanda (author), Nunes, Filipe (author), Silva, João Ricardo (author)
Formato: conferenceObject
Idioma:eng
Publicado em: 2019
Texto completo:http://hdl.handle.net/10451/37489
País:Portugal
Oai:oai:repositorio.ul.pt:10451/37489
Descrição
Resumo:This paper presents the TagShare project and the linguistic resources and tools for the shallow processing of Portuguese developed in its scope. These resources include a 1 million token corpus that has been accurately hand annotated with a variety of linguistic information, as well as several state-of­the-­art shallow processing tools capable of automatically producing that type of annotation. At present, the linguistic annotations in the corpus are sentence and paragraph boundaries, token boundaries, morphosyntactic POScategories, values of inflection features, lemmas and named­ entities. Hence, the set of tools comprise a sentence chunker, a tokenizer, a POS tagger, nominal and verbal analyzers and lemmatizers, a verbal conjugator, a nominal “inflector”, and a named­-entity recognizer, some of which underline several on­line services.