A theoretical model for n-gram distribution in big data corpora

There is a wide diversity of applications relying on the identification of the sequences of n consecutive words (n-grams) occurring in corpora. Many studies follow an empirical approach for determining the statistical distribution of the n-grams but are usually constrained by the corpora sizes, whic...

Full description

Bibliographic Details
Main Author:	Silva, Joaquim F. (author)
Other Authors:	Gonçalves, Carlos Jorge de Sousa (author), Cunha, José C. (author)
Format:	conferenceObject
Language:	eng
Published:	2017
Subjects:	n-gram Models Big Data Zipf-Mandelbrot Law Poisson Distribution Extraction of Relevant Expressions
Online Access:	http://hdl.handle.net/10400.21/6829
Country:	Portugal
Oai:	oai:repositorio.ipl.pt:10400.21/6829

Holdings
Description

Internet

http://hdl.handle.net/10400.21/6829

A theoretical model for n-gram distribution in big data corpora

Internet

Similar Items

Need Help?