On URL and content persistence
This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics...
Main Author: | |
---|---|
Other Authors: | |
Format: | report |
Language: | por |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10451/14153 |
Country: | Portugal |
Oai: | oai:repositorio.ul.pt:10451/14153 |
Summary: | This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not change |
---|