A theoretical model for n-gram distribution in big data corpora

There is a wide diversity of applications relying on the identification of the sequences of n consecutive words (n-grams) occurring in corpora. Many studies follow an empirical approach for determining the statistical distribution of the n-grams but are usually constrained by the corpora sizes, whic...

ver descrição completa

Detalhes bibliográficos
Autor principal:	Silva, Joaquim F. (author)
Outros Autores:	Gonçalves, Carlos Jorge de Sousa (author), Cunha, José C. (author)
Formato:	conferenceObject
Idioma:	eng
Publicado em:	2017
Assuntos:	n-gram Models Big Data Zipf-Mandelbrot Law Poisson Distribution Extraction of Relevant Expressions
Texto completo:	http://hdl.handle.net/10400.21/6829
País:	Portugal
Oai:	oai:repositorio.ipl.pt:10400.21/6829

Exemplares
Descrição

Texto Completo

http://hdl.handle.net/10400.21/6829

A theoretical model for n-gram distribution in big data corpora

Texto Completo

Registos relacionados

Precisa de ajuda?