Abstract
Flight cancellation prediction accuracy remains essential for airlines because it allows for automatic risk reduction of financial losses and passenger satisfaction decline. Heavy aviation big data presents challenges to traditional prediction methods which makes their practical use difficult. The proposed research brings forth an innovative approach utilizing distributed ensemble learning for conducting flight cancellation predictions at scale. The Artificial Bee Colony (ABC) algorithm operates within our method to determine the most essential predictors from an extensive dataset through optimal feature selection. The MapReduce framework enables distributed K-Nearest Neighbor (DKNN) model implementation to process features selected by the subsequent stage. The distribution of KNN models within this architecture allows the processing of extensive datasets effectively and delivers better accuracy through a collective model voting system. Our system performs computations on flight data collected from three New York City airports (JFK, LGA, and EWR) with a minimum computational advantage exceeding 25% above non-distributed KNN models. The ensemble strategy enhances prediction accuracy by 3.42% to obtain an average accuracy level of 95.79% which represents a 2.2% improvement above previous methods. Our distributed ensemble methodology proves its effectiveness for predicting flight cancellations accurately in big data environments through the presented experimental results.
| Original language | English |
|---|---|
| Article number | 34936 |
| Journal | Scientific Reports |
| Volume | 15 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Keywords
- Big data
- Distributed k nearest neighbors (DKNN)
- Ensemble learning
- Flight cancellation prediction
- MapReduce