Tokenization of Portuguese: resolving the hard cases
This research note addresses the issue of ambiguous strings, strings of non-whitespace characters whose tokenization, depending of the specific occurrence, yields one or more than one token. This sort of strings, typically coinciding with orthographically contracted forms, is shown to raise the prob...
Autor principal: | |
---|---|
Outros Autores: | |
Formato: | report |
Idioma: | por |
Publicado em: |
2009
|
Assuntos: | |
Texto completo: | http://hdl.handle.net/10451/14199 |
País: | Portugal |
Oai: | oai:repositorio.ul.pt:10451/14199 |