The Viuva Negra crawler
This report discusses architectural aspects of web crawlers and details the design, implementation and evaluation of the Viuva Negra (VN) crawler. VN has been used for 4 years, feeding a search engine and an archive of the Portuguese web. In our experiments it crawled over 2 million documents per da...
Main Author: | |
---|---|
Other Authors: | |
Format: | report |
Language: | por |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10451/14117 |
Country: | Portugal |
Oai: | oai:repositorio.ul.pt:10451/14117 |
Summary: | This report discusses architectural aspects of web crawlers and details the design, implementation and evaluation of the Viuva Negra (VN) crawler. VN has been used for 4 years, feeding a search engine and an archive of the Portuguese web. In our experiments it crawled over 2 million documents per day, correspondent to 63 GB of data. We describe hazardous situations to crawling found on the web and the adopted solutions to mitigate their effects. The gathered information was integrated in a web warehouse that provides support for its automatic processing by text mining applications. |
---|