Tokenization of Portuguese: resolving the hard cases

This research note addresses the issue of ambiguous strings, strings of non-whitespace characters whose tokenization, depending of the specific occurrence, yields one or more than one token. This sort of strings, typically coinciding with orthographically contracted forms, is shown to raise the prob...

ver descrição completa

Detalhes bibliográficos
Autor principal: Branco, António Horta (author)
Outros Autores: Silva, João (author)
Formato: report
Idioma:por
Publicado em: 2009
Assuntos:
Texto completo:http://hdl.handle.net/10451/14199
País:Portugal
Oai:oai:repositorio.ul.pt:10451/14199