Modeling ETL data quality enforcement tasks using relational algebra operators

Usually, a data warehouse is refreshed periodically with data gathered from disparate source systems. Nevertheless this data might not be fully accurate, probably containing serious data quality problems, such as uniqueness, misrepresentations, null values, multi-purpose fields, or inconsistent valu...

Full description

Bibliographic Details
Main Author: Santos, Vasco (author)
Other Authors: Belo, O. (author)
Format: conferencePaper
Language:eng
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/1822/37418
Country:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/37418
Description
Summary:Usually, a data warehouse is refreshed periodically with data gathered from disparate source systems. Nevertheless this data might not be fully accurate, probably containing serious data quality problems, such as uniqueness, misrepresentations, null values, multi-purpose fields, or inconsistent values, for one or more attributes. This is a major contribution to the falling expectations users have on data analyzed from data warehouses. Data quality enforcement is a complex time consuming task that parses data from source tables and corrects it, normalizes it and integrates it into a data warehouse for a better representation of real businesses. In this paper, we analyze some of the common tasks that are associated with data quality enforcement, representing and modeling them using Relational Algebra as specification tool.