Interdependence analysis on heterogeneous data via behavior interior dimensions

Can Wang, Chi Hung Chi, Lina Yao, Alan Wee Chung Liew, Hong Shen

研究成果: Article同行評審

摘要

Interdependent dimensions including categorical and continuous variables can be seen commonly as heterogeneous behavioral data in the real world. Mixed-type objects are more or less associated in terms of certain coupling relationships. The usual representation of such behavioral data is an information table with explicit behavior exterior dimensions (i.e. the original attributes to describe data heterogeneity), assuming the independence of dimensions and the independence of objects. However, both variables and objects are actually very often interdependent on one another either explicitly or implicitly in functional and semantic manners. Limited research has been done in analyzing such interactions among dimensions and those relationships among objects, leading to the learning results to be more local than global. This paper proposes the interdependence analysis to capture the functional multifarious relationships among attributes and among objects in heterogeneous data by addressing the coupling context and coupling weights in unsupervised learning. Such global couplings consider the interactions within discrete dimensions, within numerical attributes and across them, as well as the relationships within an individual object and between multiple objects, to form the attribute-based and object-based coupled data representation schemes based on feature conversion and neighborhood calculation. In addition, we interpret both the representation models via implicit behavior interior dimensions (i.e. the newly defined attributes to model data interdependence) to explain the intrinsic rationales for the superiority of our proposed methods. This work explicitly models the coupling of multiple attributes and the coupling of multiple objects for heterogeneous data sets, demonstrated by various data mining and machine learning applications, such as cluster structure analysis, data clustering evaluation, and data density comparison. Moreover, the sensitivity study is carried out to tune the neighborhood parameter and weight parameter, and the scalability analysis is explored to test the robustness of both models. Extensive experiments on a series of synthetic data sets and multiple UCI data sets show that our proposed framework can effectively capture the global couplings of both heterogeneous variables and mixed-type objects, and is superior to the traditional way as well as the state-of-the-art approaches, which is also verified by statistical analysis.

原文English
文章編號110893
期刊Knowledge-Based Systems
279
DOIs
出版狀態Published - 4 11月 2023

指紋

深入研究「Interdependence analysis on heterogeneous data via behavior interior dimensions」主題。共同形成了獨特的指紋。

引用此