A theoretical model for n-gram distribution in big data corpora
There is a wide diversity of applications relying on the identification of the sequences of n consecutive words (n-grams) occurring in corpora. Many studies follow an empirical approach for determining the statistical distribution of the n-grams but are usually constrained by the corpora sizes, whic...
Autor principal: | |
---|---|
Outros Autores: | , |
Formato: | conferenceObject |
Idioma: | eng |
Publicado em: |
2017
|
Assuntos: | |
Texto completo: | http://hdl.handle.net/10400.21/6829 |
País: | Portugal |
Oai: | oai:repositorio.ipl.pt:10400.21/6829 |