Measuring similarity of complex and heterogeneous data in clustering of large data sets

Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for finding a clustering structure on a dataset. That may refer either to groups of statistical data units or to groups of variables. In this work we deal with a generalization...

Full description

Bibliographic Details
Main Author: Nicolau, Helena Bacelar (author)
Other Authors: Nicolau, Fernando (author), Sousa, Áurea (author), Nicolau, Leonor Bacelar (author)
Format: article
Language:eng
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10451/5659
Country:Portugal
Oai:oai:repositorio.ul.pt:10451/5659
Description
Summary:Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for finding a clustering structure on a dataset. That may refer either to groups of statistical data units or to groups of variables. In this work we deal with a generalization of this paradigm concerning clustering of complex data described by three different types of variables, frequently present in a three-way context. We obtain compatible versions of the same affinity coefficient for measuring similarity between statistical data units described by those three types of variables. A global generalized similarity coefficient is analyzed for such kind of mixed data, often arising in data mining or knowledge mining.