Multi-criteria feature selection on cost-sensitive data with missing values

Wenhao Shu, Hong Shen

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)

Abstract

Feature selection plays an important role in pattern recognition and machine learning. Confronted with high dimensional data in many data analysis tasks, feature selection techniques are designed to find a relevant feature subset of the original features which can facilitate classification. However, in many real-world applications, missing feature values that contribute to test and misclassification costs are emerging to be an issue of increasing concern for most data sets, particularly dealing with big data. The existing feature selection approaches do not address this issue effectively. In this paper, based on rough set theory we address the problem of feature selection for cost-sensitive data with missing values. We first propose a multi-criteria evaluation function to characterize the significance of candidate features, by taking into consideration not only the power in the positive region and boundary region but also their associated costs. On this basis, we develop a forward greedy feature selection algorithm for selecting a feature subset of minimized cost that preserves the same information as the whole feature set. In addition, to improve the efficiency of this algorithm, we implement the selection of candidate features in a dwindling object set. Finally, we demonstrate the superior performance of the proposed algorithm to the existing feature selection algorithms through experimental results on different data sets.

Original languageEnglish
Pages (from-to)268-280
Number of pages13
JournalPattern Recognition
Volume51
DOIs
Publication statusPublished - 1 Mar 2016
Externally publishedYes

Keywords

  • Cost-sensitivedata
  • Featureselection
  • Incomplete data
  • Multi-criteria
  • Roughsets

Fingerprint

Dive into the research topics of 'Multi-criteria feature selection on cost-sensitive data with missing values'. Together they form a unique fingerprint.

Cite this