SmartClean: an incremental data cleaning tool

This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that,...

Full description

Bibliographic Details
Main Author: Oliveira, Paulo (author)
Other Authors: Rodrigues, Fátima (author), Henriques, Pedro (author)
Format: conferenceObject
Language:eng
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10400.22/1583
Country:Portugal
Oai:oai:recipp.ipp.pt:10400.22/1583
Description
Summary:This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool's validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.