TY - GEN
T1 - EVALUATION OF MOBILE EDUCATION APPS BY POPULAR GENERATIVE ARTIFICIAL INTELLIGENCE (GENAI) SYSTEMS
T2 - 21st International Conferences on Mobile Learning, ML 2025 and 10th International Conference on Educational Technologies, ICEduTech 2025
AU - Chan, Victor K.Y.
N1 - Publisher Copyright:
© 2025 Proceedings of the International Conferences on Mobile Learning 2025 and Educational Technologies 2025. All rights reserved.
PY - 2025
Y1 - 2025
N2 - As an extension to a previous pilot study, this article aims to analyze the consistency between and thus the convergent validity of a few popular generative artificial intelligence (GenAI) systems in evaluating popular mobile education apps’ usability, effectiveness, and efficiency. The three GenAI systems examined were Microsoft Copilot, Google PaLM, and Assistant, which were individually prompted to award rating scores to the eight major dimensions of usability, effectiveness, and efficency, namely, (1) content/course quality, (2) pedagogical design, (3) learner support, (4) technology infrastructure, (5) social interaction, (6) learner engagement, (7) instructor support, and (8) cost-effectiveness of 100 popular mobile education apps. A paired sample t-test was applied to the rating score difference in each of the above eight dimensions between each GenAI system pair out of the above three GenAI systems over all the 100 apps. Then, Cronbach’s coefficient alpha of the rating scores was computed for each of the above eight dimensions between all the three GenAI systems across all the 100 apps. The computational results were to confirm whether the GenAI systems, with respect to each other, systematically overrated or underrated any dimension over the 100 apps and whether there were high consistency between and thus convergent validity of the three GenAI systems in evaluating each dimension across the 100 apps. Among other collateral finding, it was revealed that the consistency between and thus the convergent validity of the three GenAI systems was basically sufficiently high in evaluating all the eight dimensions across the 100 apps, with those in the dimension (8) cost-effectiveness being at least marginally high enough, and thus the three GenAI systems may be rather reliable in evaluating all the eight usability, effectiveness, and efficiency dimensions across mobile education apps.
AB - As an extension to a previous pilot study, this article aims to analyze the consistency between and thus the convergent validity of a few popular generative artificial intelligence (GenAI) systems in evaluating popular mobile education apps’ usability, effectiveness, and efficiency. The three GenAI systems examined were Microsoft Copilot, Google PaLM, and Assistant, which were individually prompted to award rating scores to the eight major dimensions of usability, effectiveness, and efficency, namely, (1) content/course quality, (2) pedagogical design, (3) learner support, (4) technology infrastructure, (5) social interaction, (6) learner engagement, (7) instructor support, and (8) cost-effectiveness of 100 popular mobile education apps. A paired sample t-test was applied to the rating score difference in each of the above eight dimensions between each GenAI system pair out of the above three GenAI systems over all the 100 apps. Then, Cronbach’s coefficient alpha of the rating scores was computed for each of the above eight dimensions between all the three GenAI systems across all the 100 apps. The computational results were to confirm whether the GenAI systems, with respect to each other, systematically overrated or underrated any dimension over the 100 apps and whether there were high consistency between and thus convergent validity of the three GenAI systems in evaluating each dimension across the 100 apps. Among other collateral finding, it was revealed that the consistency between and thus the convergent validity of the three GenAI systems was basically sufficiently high in evaluating all the eight dimensions across the 100 apps, with those in the dimension (8) cost-effectiveness being at least marginally high enough, and thus the three GenAI systems may be rather reliable in evaluating all the eight usability, effectiveness, and efficiency dimensions across mobile education apps.
KW - Convergent Validity
KW - Evaluation
KW - Generative Artificial Intelligence (Genai)
KW - Mobile Education Apps
UR - http://www.scopus.com/inward/record.url?scp=105003379720&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:105003379720
T3 - Proceedings of the International Conferences on Mobile Learning 2025 and Educational Technologies 2025
SP - 151
EP - 161
BT - Proceedings of the International Conferences on Mobile Learning 2025 and Educational Technologies 2025
A2 - Sanchez, Inmaculada Arnedillo
A2 - Kommers, Piet
A2 - Issa, Tomayess
A2 - Isaias, Pedro
A2 - Rodrigues, Luis
PB - IADIS
Y2 - 1 March 2025 through 3 March 2025
ER -