A commodity platform for Distributed Data Mining - the HARVARD System

Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was dep...

ver descrição completa

Detalhes bibliográficos
Autor principal: Ruy Ramos (author)
Outros Autores: Rui Camacho (author), Pedro Souto (author)
Formato: book
Idioma:eng
Publicado em: 2006
Assuntos:
Texto completo:https://repositorio-aberto.up.pt/handle/10216/73310
País:Portugal
Oai:oai:repositorio-aberto.up.pt:10216/73310
Descrição
Resumo:Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.