Collecting Statistics about the Portuguese Web
This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics....
Autor principal: | |
---|---|
Outros Autores: | |
Formato: | report |
Idioma: | por |
Publicado em: |
2009
|
Assuntos: | |
Texto completo: | http://hdl.handle.net/10451/14211 |
País: | Portugal |
Oai: | oai:repositorio.ul.pt:10451/14211 |