Collecting Statistics about the Portuguese Web

This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics....

ver descrição completa

Detalhes bibliográficos
Autor principal:	Gomes, Daniel (author)
Outros Autores:	Silva, Mário J. (author)
Formato:	report
Idioma:	por
Publicado em:	2009
Assuntos:	Web characterization Portuguese Portugal tumba! statistics crawling
Texto completo:	http://hdl.handle.net/10451/14211
País:	Portugal
Oai:	oai:repositorio.ul.pt:10451/14211

Descrição
Resumo:	This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the results

Collecting Statistics about the Portuguese Web

Registos relacionados

Precisa de ajuda?