Clustering Algorithms based Noise Identification from Air Pollution Monitoring Data

Xinyi Fang, Chak Fong Chong, Xu Yang, Yapeng Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The development of data science has brought about many discussions of noise detection, and so far, there is no universal best method. In this paper, we propose a clustering-algorithm-based solution to identify and remove noise from air pollution data collected with mobile portable sensors. The test dataset is the air pollution data collected by the portable sensors throughout three seasons at the campus in Macao. We have applied and compared six clustering algorithms to identify the most appropriate clustering algorithm to achieve this goal: Simple K-means, Hierarchical Clustering, Cascading K-means, X-means, Expectation Maximization, and Self-Organizing Map. The performance is evaluated by their accuracy and the best number of clusters calculated by the Silhouette Coefficient. Additionally, a classification algorithm J48 tree can extract the key attributes and identify the noise cluster for future unlabeled data that may contain noise. The experiment results indicate that the Expectation Maximization and Cascading Simple K-Means perform the best. Moreover, temperature and carbon dioxide are vital attributes in identifying the noise cluster.

Original languageEnglish
Title of host publicationProceedings of IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665453059
DOIs
Publication statusPublished - 2022
Event2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2022 - Gold Coast, Australia
Duration: 18 Dec 202220 Dec 2022

Publication series

NameProceedings of IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2022

Conference

Conference2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2022
Country/TerritoryAustralia
CityGold Coast
Period18/12/2220/12/22

Keywords

  • air pollution data
  • data clustering
  • noise identification
  • noise removal
  • portable sensor

Fingerprint

Dive into the research topics of 'Clustering Algorithms based Noise Identification from Air Pollution Monitoring Data'. Together they form a unique fingerprint.

Cite this