TY - JOUR
T1 - Interdependence analysis on heterogeneous data via behavior interior dimensions
AU - Wang, Can
AU - Chi, Chi Hung
AU - Yao, Lina
AU - Liew, Alan Wee Chung
AU - Shen, Hong
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2023/11/4
Y1 - 2023/11/4
N2 - Interdependent dimensions including categorical and continuous variables can be seen commonly as heterogeneous behavioral data in the real world. Mixed-type objects are more or less associated in terms of certain coupling relationships. The usual representation of such behavioral data is an information table with explicit behavior exterior dimensions (i.e. the original attributes to describe data heterogeneity), assuming the independence of dimensions and the independence of objects. However, both variables and objects are actually very often interdependent on one another either explicitly or implicitly in functional and semantic manners. Limited research has been done in analyzing such interactions among dimensions and those relationships among objects, leading to the learning results to be more local than global. This paper proposes the interdependence analysis to capture the functional multifarious relationships among attributes and among objects in heterogeneous data by addressing the coupling context and coupling weights in unsupervised learning. Such global couplings consider the interactions within discrete dimensions, within numerical attributes and across them, as well as the relationships within an individual object and between multiple objects, to form the attribute-based and object-based coupled data representation schemes based on feature conversion and neighborhood calculation. In addition, we interpret both the representation models via implicit behavior interior dimensions (i.e. the newly defined attributes to model data interdependence) to explain the intrinsic rationales for the superiority of our proposed methods. This work explicitly models the coupling of multiple attributes and the coupling of multiple objects for heterogeneous data sets, demonstrated by various data mining and machine learning applications, such as cluster structure analysis, data clustering evaluation, and data density comparison. Moreover, the sensitivity study is carried out to tune the neighborhood parameter and weight parameter, and the scalability analysis is explored to test the robustness of both models. Extensive experiments on a series of synthetic data sets and multiple UCI data sets show that our proposed framework can effectively capture the global couplings of both heterogeneous variables and mixed-type objects, and is superior to the traditional way as well as the state-of-the-art approaches, which is also verified by statistical analysis.
AB - Interdependent dimensions including categorical and continuous variables can be seen commonly as heterogeneous behavioral data in the real world. Mixed-type objects are more or less associated in terms of certain coupling relationships. The usual representation of such behavioral data is an information table with explicit behavior exterior dimensions (i.e. the original attributes to describe data heterogeneity), assuming the independence of dimensions and the independence of objects. However, both variables and objects are actually very often interdependent on one another either explicitly or implicitly in functional and semantic manners. Limited research has been done in analyzing such interactions among dimensions and those relationships among objects, leading to the learning results to be more local than global. This paper proposes the interdependence analysis to capture the functional multifarious relationships among attributes and among objects in heterogeneous data by addressing the coupling context and coupling weights in unsupervised learning. Such global couplings consider the interactions within discrete dimensions, within numerical attributes and across them, as well as the relationships within an individual object and between multiple objects, to form the attribute-based and object-based coupled data representation schemes based on feature conversion and neighborhood calculation. In addition, we interpret both the representation models via implicit behavior interior dimensions (i.e. the newly defined attributes to model data interdependence) to explain the intrinsic rationales for the superiority of our proposed methods. This work explicitly models the coupling of multiple attributes and the coupling of multiple objects for heterogeneous data sets, demonstrated by various data mining and machine learning applications, such as cluster structure analysis, data clustering evaluation, and data density comparison. Moreover, the sensitivity study is carried out to tune the neighborhood parameter and weight parameter, and the scalability analysis is explored to test the robustness of both models. Extensive experiments on a series of synthetic data sets and multiple UCI data sets show that our proposed framework can effectively capture the global couplings of both heterogeneous variables and mixed-type objects, and is superior to the traditional way as well as the state-of-the-art approaches, which is also verified by statistical analysis.
KW - Behavior
KW - Coupling
KW - Dimensions
KW - Heterogeneity
KW - Interdependence
UR - http://www.scopus.com/inward/record.url?scp=85170651145&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110893
DO - 10.1016/j.knosys.2023.110893
M3 - Article
AN - SCOPUS:85170651145
SN - 0950-7051
VL - 279
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110893
ER -