Scalable flight cancellation prediction with ensemble distributed KNN and feature selection

Ho Yin Kan, Keith Chau, Patrick Cheong-iao Pang

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Flight cancellation prediction accuracy remains essential for airlines because it allows for automatic risk reduction of financial losses and passenger satisfaction decline. Heavy aviation big data presents challenges to traditional prediction methods which makes their practical use difficult. The proposed research brings forth an innovative approach utilizing distributed ensemble learning for conducting flight cancellation predictions at scale. The Artificial Bee Colony (ABC) algorithm operates within our method to determine the most essential predictors from an extensive dataset through optimal feature selection. The MapReduce framework enables distributed K-Nearest Neighbor (DKNN) model implementation to process features selected by the subsequent stage. The distribution of KNN models within this architecture allows the processing of extensive datasets effectively and delivers better accuracy through a collective model voting system. Our system performs computations on flight data collected from three New York City airports (JFK, LGA, and EWR) with a minimum computational advantage exceeding 25% above non-distributed KNN models. The ensemble strategy enhances prediction accuracy by 3.42% to obtain an average accuracy level of 95.79% which represents a 2.2% improvement above previous methods. Our distributed ensemble methodology proves its effectiveness for predicting flight cancellations accurately in big data environments through the presented experimental results.

Original languageEnglish
Article number34936
JournalScientific Reports
Volume15
Issue number1
DOIs
Publication statusPublished - Dec 2025

Keywords

  • Big data
  • Distributed k nearest neighbors (DKNN)
  • Ensemble learning
  • Flight cancellation prediction
  • MapReduce

Fingerprint

Dive into the research topics of 'Scalable flight cancellation prediction with ensemble distributed KNN and feature selection'. Together they form a unique fingerprint.

Cite this