On URL and content persistence

This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics...

ver descrição completa

Detalhes bibliográficos
Autor principal: Gomes, Daniel (author)
Outros Autores: Silva, Mário J. (author)
Formato: report
Idioma:por
Publicado em: 2009
Assuntos:
Texto completo:http://hdl.handle.net/10451/14153
País:Portugal
Oai:oai:repositorio.ul.pt:10451/14153
Descrição
Resumo:This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not change