Multi-criteria feature selection on cost-sensitive data with missing values

Wenhao Shu, Hong Shen

研究成果: Article同行評審

28 引文 斯高帕斯(Scopus)

摘要

Feature selection plays an important role in pattern recognition and machine learning. Confronted with high dimensional data in many data analysis tasks, feature selection techniques are designed to find a relevant feature subset of the original features which can facilitate classification. However, in many real-world applications, missing feature values that contribute to test and misclassification costs are emerging to be an issue of increasing concern for most data sets, particularly dealing with big data. The existing feature selection approaches do not address this issue effectively. In this paper, based on rough set theory we address the problem of feature selection for cost-sensitive data with missing values. We first propose a multi-criteria evaluation function to characterize the significance of candidate features, by taking into consideration not only the power in the positive region and boundary region but also their associated costs. On this basis, we develop a forward greedy feature selection algorithm for selecting a feature subset of minimized cost that preserves the same information as the whole feature set. In addition, to improve the efficiency of this algorithm, we implement the selection of candidate features in a dwindling object set. Finally, we demonstrate the superior performance of the proposed algorithm to the existing feature selection algorithms through experimental results on different data sets.

原文English
頁(從 - 到)268-280
頁數13
期刊Pattern Recognition
51
DOIs
出版狀態Published - 1 3月 2016
對外發佈

指紋

深入研究「Multi-criteria feature selection on cost-sensitive data with missing values」主題。共同形成了獨特的指紋。

引用此