TY - GEN
T1 - A Lightweight Method using LightGBM Model with Optuna in MOOCs Dropout Prediction
AU - Ng, Kary
AU - Lei, Philip
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/7/13
Y1 - 2022/7/13
N2 - In recent years, Massive Open Online Course (MOOC) has greatly changed the way the world learns. MOOC platforms (MOOCs) offer free online courses for everyone, but the high dropout rate on MOOCs is a serious problem, so early prediction of students with dropout intentions is useful to reduce the dropout rate by taking suitable intervention. Because MOOC is an online course, the system can easily collect the users' learning logs. Using data mining techniques on the user learning log to do prediction, this study proposed a method to extract some useful features from the user behaviour and offer a lightweight method based on the Light Gradient Boosting Machine (LightGBM) with the Optuna tuning method to predict the probability that a user would drop out of a course in next 10 days. This study also explored the effect of lower feature granularity by splitting in record period into three stages instead of splitting into weeks, which used fewer learning records of the public Knowledge Discovery and Data Mining (KDD) MOOC dataset to bring less workload than other works. The proposed method requires fewer features than previous works and thus can speed up the training time while being more scalable with a large number of users and learning activities in MOOC. The experiment results showed that the performance is similar to or higher than the related previous studies, with an AUCROC score of 89.12%, AUCPR score of 96.04% and F1-score of 92.12%. This study also examines the effect on dropout prediction accuracy when training data is limited to one- and two-thirds of the original duration and finds that comparable performance can still be achieved.
AB - In recent years, Massive Open Online Course (MOOC) has greatly changed the way the world learns. MOOC platforms (MOOCs) offer free online courses for everyone, but the high dropout rate on MOOCs is a serious problem, so early prediction of students with dropout intentions is useful to reduce the dropout rate by taking suitable intervention. Because MOOC is an online course, the system can easily collect the users' learning logs. Using data mining techniques on the user learning log to do prediction, this study proposed a method to extract some useful features from the user behaviour and offer a lightweight method based on the Light Gradient Boosting Machine (LightGBM) with the Optuna tuning method to predict the probability that a user would drop out of a course in next 10 days. This study also explored the effect of lower feature granularity by splitting in record period into three stages instead of splitting into weeks, which used fewer learning records of the public Knowledge Discovery and Data Mining (KDD) MOOC dataset to bring less workload than other works. The proposed method requires fewer features than previous works and thus can speed up the training time while being more scalable with a large number of users and learning activities in MOOC. The experiment results showed that the performance is similar to or higher than the related previous studies, with an AUCROC score of 89.12%, AUCPR score of 96.04% and F1-score of 92.12%. This study also examines the effect on dropout prediction accuracy when training data is limited to one- and two-thirds of the original duration and finds that comparable performance can still be achieved.
KW - Dropout prediction
KW - Educational data mining
KW - KDD
KW - LightGBM
KW - MOOC
KW - Optuna
UR - http://www.scopus.com/inward/record.url?scp=85142779138&partnerID=8YFLogxK
U2 - 10.1145/3551708.3551732
DO - 10.1145/3551708.3551732
M3 - Conference contribution
AN - SCOPUS:85142779138
T3 - ACM International Conference Proceeding Series
SP - 53
EP - 59
BT - ICEMT 2022 - 2022 6th International Conference on Education and Multimedia Technology
PB - Association for Computing Machinery
T2 - 6th International Conference on Education and Multimedia Technology, ICEMT 2022
Y2 - 13 July 2022 through 15 July 2022
ER -