Summary: | Web repositories are large scale warehouses of data downloaded from the Web, needed by applications that summarize that data to produce results that help people use information. Time is a central dimension in Web data, because the Web is continuously changing and it is impossible to get a snapshot of a large portion of the Web instantaneously. Developers of applications that manage Web data usually distribute the operations performed by their applications over several processing nodes, to scale-up to the amount of data that may be processed. Versus is a model for a repository providing time oriented distributed Web data management. Time is managed by versioning the objects saved in the repository. Distribution is managed by using a hierarchy of workspaces. Distributed threads work on data stored in the lower level workspaces, and save it by checking-in that data into the workspace in the next upper level. Versus applications can specify the granularity of the distribution and the conflict resolution policies they want to implement. This allows a great control over the repository, increasing the number and type of applications it is suitable to support. The Versus model was embodied in a prototype that is being used to build applications for managing data collected from the Web
|