Comparison of Data Imputation Performance in Deep Generative Models for Educational Tabular Missing Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Missing data presents a significant challenge in Educational Data Mining (EDM). Imputation techniques aim to reconstruct missing data while preserving critical information in datasets for more accurate analysis. Although imputation techniques have gained attention in various fields in recent years, their use for addressing missing data in education remains limited. This study contributes to filling the research gap by evaluating state-of-the-art deep generative models: Tabular Variational Autoencoder (TVAE), Conditional Tabular Generative Adversarial Networks (CTGAN), and Tabular Denoising Diffusion Probabilistic Models (TabDDPM) for imputing missing values using the Open University Learning Analytics Dataset (OULAD) with varying levels of missing data. These deep generative models identify relationships among demographic, behavioral, and partial assessment data to impute absent numerical assessment scores. TabDDPM showed the best imputation performance and maintained closer alignment with the original data, as demonstrated by the KL divergence and KDE plots. To further enhance predictive modeling performance with imputed data, this study proposes TabDDPM-SMOTE, which combines TabDDPM with the Synthetic Minority Over-sampling Technique (SMOTE) to tackle the class imbalance often encountered in educational datasets. Our TabDDPM-SMOTE model consistently achieves the highest F1-score when using the imputed data in XGBoost classification tasks, showcasing its strong efficiency and potential to enhance predictive effectiveness modeling.

Original languageEnglish
Title of host publicationProceedings of the 18th International Conference on Educational Data Mining, EDM 2025
EditorsCaitlin Mills, Giora Alexandron, Davide Taibi, Giosuè Lo Bosco, Luc Paquette
PublisherInternational Educational Data Mining Society
Pages133-142
Number of pages10
ISBN (Print)9781733673662
DOIs
Publication statusPublished - 2025
Event18th International Conference on Educational Data Mining, EDM 2025 - Palermo, Italy
Duration: 20 Jul 202523 Jul 2025

Publication series

NameProceedings of the International Conference on Educational Data Mining
ISSN (Electronic)2960-2866

Conference

Conference18th International Conference on Educational Data Mining, EDM 2025
Country/TerritoryItaly
CityPalermo
Period20/07/2523/07/25

Keywords

  • Deep Learning Model
  • Educational Data Mining
  • Educational Tabular Missing Data
  • Tabular Missing Data Imputation

Fingerprint

Dive into the research topics of 'Comparison of Data Imputation Performance in Deep Generative Models for Educational Tabular Missing Data'. Together they form a unique fingerprint.

Cite this