TY - GEN
T1 - Analyzing the Interpretability of Machine Learning Prediction on Student Performance Using SHapley Additive exPlanations
AU - Choi, Wan Chong
AU - Lam, Chan Tong
AU - Mendes, Antonio Jose
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This study compared several machine learning algorithms to predict student programming learning performance in an online learning environment. It used Explainable Machine Learning (EML) techniques to enhance interpretability. A range of algorithms, including Random Forest, Extra Trees, CatBoost, XGBoost, Naive Bayes, and KNearest Neighbors (KNN), were compared, with Extra Trees delivering the best results. Distinct from other EDM research mainly focused on predictive efficiency, we contributed by using the EML technique of SHapley Additive Explanations (SHAP), rooted in the Game Theory framework, to enhance model interpretability at both global and individual levels. At the global level, summary plots showed overall feature impacts, bar plots quantified the average effect of each feature, and dependence plots highlighted specific relationships. At the individual level, force plots identified critical features for individual predictions, decision plots traced the cumulative impact of features from the base value to the final output, and waterfall plots provided a breakdown of predictions. This study contributes to EDM by offering accurate predictive models and detailed interpretability, helping educational stakeholders make data-informed decisions to improve student outcomes.
AB - This study compared several machine learning algorithms to predict student programming learning performance in an online learning environment. It used Explainable Machine Learning (EML) techniques to enhance interpretability. A range of algorithms, including Random Forest, Extra Trees, CatBoost, XGBoost, Naive Bayes, and KNearest Neighbors (KNN), were compared, with Extra Trees delivering the best results. Distinct from other EDM research mainly focused on predictive efficiency, we contributed by using the EML technique of SHapley Additive Explanations (SHAP), rooted in the Game Theory framework, to enhance model interpretability at both global and individual levels. At the global level, summary plots showed overall feature impacts, bar plots quantified the average effect of each feature, and dependence plots highlighted specific relationships. At the individual level, force plots identified critical features for individual predictions, decision plots traced the cumulative impact of features from the base value to the final output, and waterfall plots provided a breakdown of predictions. This study contributes to EDM by offering accurate predictive models and detailed interpretability, helping educational stakeholders make data-informed decisions to improve student outcomes.
KW - Educational data mining
KW - Explainable machine learning
KW - Learning performance prediction
KW - SHapley Additive exPlanations
UR - http://www.scopus.com/inward/record.url?scp=85217048129&partnerID=8YFLogxK
U2 - 10.1109/TALE62452.2024.10834292
DO - 10.1109/TALE62452.2024.10834292
M3 - Conference contribution
AN - SCOPUS:85217048129
T3 - 2024 IEEE International Conference on Teaching, Assessment and Learning for Engineering, TALE 2024 - Proceedings
BT - 2024 IEEE International Conference on Teaching, Assessment and Learning for Engineering, TALE 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th IEEE International Conference on Teaching, Assessment and Learning for Engineering, TALE 2024
Y2 - 9 December 2024 through 12 December 2024
ER -