Comparison of Data Imputation Performance in Deep Generative Models for Educational Tabular Missing Data

研究成果: Conference contribution同行評審

摘要

Missing data presents a significant challenge in Educational Data Mining (EDM). Imputation techniques aim to reconstruct missing data while preserving critical information in datasets for more accurate analysis. Although imputation techniques have gained attention in various fields in recent years, their use for addressing missing data in education remains limited. This study contributes to filling the research gap by evaluating state-of-the-art deep generative models: Tabular Variational Autoencoder (TVAE), Conditional Tabular Generative Adversarial Networks (CTGAN), and Tabular Denoising Diffusion Probabilistic Models (TabDDPM) for imputing missing values using the Open University Learning Analytics Dataset (OULAD) with varying levels of missing data. These deep generative models identify relationships among demographic, behavioral, and partial assessment data to impute absent numerical assessment scores. TabDDPM showed the best imputation performance and maintained closer alignment with the original data, as demonstrated by the KL divergence and KDE plots. To further enhance predictive modeling performance with imputed data, this study proposes TabDDPM-SMOTE, which combines TabDDPM with the Synthetic Minority Over-sampling Technique (SMOTE) to tackle the class imbalance often encountered in educational datasets. Our TabDDPM-SMOTE model consistently achieves the highest F1-score when using the imputed data in XGBoost classification tasks, showcasing its strong efficiency and potential to enhance predictive effectiveness modeling.

原文English
主出版物標題Proceedings of the 18th International Conference on Educational Data Mining, EDM 2025
編輯Caitlin Mills, Giora Alexandron, Davide Taibi, Giosuè Lo Bosco, Luc Paquette
發行者International Educational Data Mining Society
頁面133-142
頁數10
ISBN(列印)9781733673662
DOIs
出版狀態Published - 2025
事件18th International Conference on Educational Data Mining, EDM 2025 - Palermo, Italy
持續時間: 20 7月 202523 7月 2025

出版系列

名字Proceedings of the International Conference on Educational Data Mining
ISSN(電子)2960-2866

Conference

Conference18th International Conference on Educational Data Mining, EDM 2025
國家/地區Italy
城市Palermo
期間20/07/2523/07/25

指紋

深入研究「Comparison of Data Imputation Performance in Deep Generative Models for Educational Tabular Missing Data」主題。共同形成了獨特的指紋。

引用此