TY - GEN
T1 - A Novel Software Tool for Fast Multiview Visualization of High-Dimensional Datasets
AU - Zhang, Luying
AU - Tian, Hui
AU - Shen, Hong
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Scatterplot is a popular technique for visualizing high-dimensional datasets by using linear and nonlinear dimension reduction methods. These methods map the original high-dimensional dataset onto scatterplot points directly by dimension reduction, and hence require a high computation cost. Despite many improvements in scatterplot visual effects, however, when the data volume is large, the data mapped onto scatterplot data points will overlap, resulting a low quality of visualization. In this paper, we propose a novel software tool that ensembles five integrated components for fast multiview visualization of high-dimensional datasets: sampling, dimension reduction, clustering, multiview collaborative analysis, and dimension re-arrangement. In our tool, while the sampling component reduces the sizes of the datasets applying the random sampling technique to gain a high visualization efficiency, dimension reduction reduces the dimensions of the datasets applying principal-component analysis to improve the visualization quality. Next, clustering discovers hidden information in the reduced dataset applying fuzzy c-mean clustering to display hidden patterns of the original datasets. Finally, multiview collaborative analysis enables users to analyse multidimensional datasets from different aspects at the same time by combining scatterplot and scatterplot matrices. To optimize the visualization effects, in the scatterplot matrices, we re-arrange their dimensions and adjust the positions of scatterplots so that similar scatterplot points are adjacent in positions. As the result, in comparison with the existing visualization tools that apply some of these techniques, our tool not only improves the efficiency of dimension reduction but also enhances the quality of visualization and enables more comprehensive analysis. We test our tool on different real datasets to demonstrate its effectiveness. The experimental results validate that our method is effective in both efficiency and quality of visualization.
AB - Scatterplot is a popular technique for visualizing high-dimensional datasets by using linear and nonlinear dimension reduction methods. These methods map the original high-dimensional dataset onto scatterplot points directly by dimension reduction, and hence require a high computation cost. Despite many improvements in scatterplot visual effects, however, when the data volume is large, the data mapped onto scatterplot data points will overlap, resulting a low quality of visualization. In this paper, we propose a novel software tool that ensembles five integrated components for fast multiview visualization of high-dimensional datasets: sampling, dimension reduction, clustering, multiview collaborative analysis, and dimension re-arrangement. In our tool, while the sampling component reduces the sizes of the datasets applying the random sampling technique to gain a high visualization efficiency, dimension reduction reduces the dimensions of the datasets applying principal-component analysis to improve the visualization quality. Next, clustering discovers hidden information in the reduced dataset applying fuzzy c-mean clustering to display hidden patterns of the original datasets. Finally, multiview collaborative analysis enables users to analyse multidimensional datasets from different aspects at the same time by combining scatterplot and scatterplot matrices. To optimize the visualization effects, in the scatterplot matrices, we re-arrange their dimensions and adjust the positions of scatterplots so that similar scatterplot points are adjacent in positions. As the result, in comparison with the existing visualization tools that apply some of these techniques, our tool not only improves the efficiency of dimension reduction but also enhances the quality of visualization and enables more comprehensive analysis. We test our tool on different real datasets to demonstrate its effectiveness. The experimental results validate that our method is effective in both efficiency and quality of visualization.
KW - Data sampling
KW - data clustering
KW - dimension reduction
KW - multiview visualization
KW - software tool
UR - http://www.scopus.com/inward/record.url?scp=85174529220&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-42430-4_25
DO - 10.1007/978-3-031-42430-4_25
M3 - Conference contribution
AN - SCOPUS:85174529220
SN - 9783031424298
T3 - Communications in Computer and Information Science
SP - 303
EP - 316
BT - Recent Challenges in Intelligent Information and Database Systems - 15th Asian Conference, ACIIDS 2023, Proceedings
A2 - Nguyen, Ngoc Thanh
A2 - Boonsang, Siridech
A2 - Pasupa, Kitsuchart
A2 - Fujita, Hamido
A2 - Hnatkowska, Bogumiła
A2 - Hong, Tzung-Pei
A2 - Selamat, Ali
PB - Springer Science and Business Media Deutschland GmbH
T2 - 15th International scientific conferences on research and applications in the field of intelligent information and database systems, ACIIDS 2023
Y2 - 24 July 2023 through 26 July 2023
ER -