Modeling ETL data quality enforcement tasks using relational algebra operators

Usually, a data warehouse is refreshed periodically with data gathered from disparate source systems. Nevertheless this data might not be fully accurate, probably containing serious data quality problems, such as uniqueness, misrepresentations, null values, multi-purpose fields, or inconsistent valu...

ver descrição completa

Detalhes bibliográficos
Autor principal: Santos, Vasco (author)
Outros Autores: Belo, O. (author)
Formato: conferencePaper
Idioma:eng
Publicado em: 2013
Assuntos:
Texto completo:http://hdl.handle.net/1822/37418
País:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/37418
Descrição
Resumo:Usually, a data warehouse is refreshed periodically with data gathered from disparate source systems. Nevertheless this data might not be fully accurate, probably containing serious data quality problems, such as uniqueness, misrepresentations, null values, multi-purpose fields, or inconsistent values, for one or more attributes. This is a major contribution to the falling expectations users have on data analyzed from data warehouses. Data quality enforcement is a complex time consuming task that parses data from source tables and corrects it, normalizes it and integrates it into a data warehouse for a better representation of real businesses. In this paper, we analyze some of the common tasks that are associated with data quality enforcement, representing and modeling them using Relational Algebra as specification tool.