Versus: a Web Data Repository with Time Support

Web repositories are large scale warehouses of data downloaded from the Web, needed by applications that summarize that data to produce results that help people use information. Time is a central dimension in Web data, because the Web is continuously changing and it is impossible to get a snapshot o...

Full description

Bibliographic Details
Main Author: Campos, João P. (author)
Format: masterThesis
Language:por
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10451/14009
Country:Portugal
Oai:oai:repositorio.ul.pt:10451/14009
Description
Summary:Web repositories are large scale warehouses of data downloaded from the Web, needed by applications that summarize that data to produce results that help people use information. Time is a central dimension in Web data, because the Web is continuously changing and it is impossible to get a snapshot of a large portion of the Web instantaneously. Developers of applications that manage Web data usually distribute the operations performed by their applications over several processing nodes, to scale-up to the amount of data that may be processed. Versus is a model for a repository providing time oriented distributed Web data management. Time is managed by versioning the objects saved in the repository. Distribution is managed by using a hierarchy of workspaces. Distributed threads work on data stored in the lower level workspaces, and save it by checking-in that data into the workspace in the next upper level. Versus applications can specify the granularity of the distribution and the conflict resolution policies they want to implement. This allows a great control over the repository, increasing the number and type of applications it is suitable to support. The Versus model was embodied in a prototype that is being used to build applications for managing data collected from the Web