EVALUATING POPULAR MOOC PLATFORMS BY GENERATIVE ARTIFICIAL INTELLIGENCE (AI) ROBOTS: HOW CONSISTENT ARE THE ROBOTS?

Research output: Contribution to conferencePaperpeer-review

Abstract

This article intends to investigate the consistency between a few popular generative AI robots in the evaluation of massive open online course (MOOC) platforms. The four robots experimented with in the study were Claude+, GPT-4, Sage, and Dragonfly, which were tasked with awarding rating scores to the eight major dimensions, namely (1) content/course quality, (2) pedagogical design, (3) learner support, (4) technology infrastructure, (5) social interaction, (6) learner engagement, (7) instructor support, and (8) cost-effectiveness, of the 31 currently very popular MOOC platforms. Only Claude+’s and Dragonfly’s rating scores turned out to be amenable to statistical analysis. For each of the two robots, the minimum, the maximum, the range, and the standard deviation of the rating scores for each of the eight dimensions were computed across all the 31 MOOC platforms. The rating score difference for each of the eight dimensions between the two robots was calculated for each platform. The mean of the absolute value, the minimum, the maximum, the range, and the standard deviation of the differences for each dimension between the two robots were calculated across all platforms. A paired sample t-test was then applied to each dimension for the rating score difference between the two robots over all the platforms. Finally, a correlation coefficient of the rating scores was computed for each of the eight dimensions between the two robots across all the MOOC platforms. The computational results were to reveal whether the two robots awarded discrimination in evaluating each dimension across the platforms, whether any of the two robots systematically underrated or overrated any dimension with respect to the other robot, and whether there was consistency between the two robots in evaluating each dimension across the platforms. It was found that discrimination was prominent in the evaluation of all dimensions save Dragonfly’s rating of the dimensions learner support, technology infrastructure, and instructor support, Claude+ systematically underrated all dimensions (p < 0.000 < 0.05) compared with Dragonfly except for the dimension cost-effectiveness, which Claude+ systematically overrated (p = 0.003 < 0.05), and the evaluation by the two robots was consistent only for the dimensions content/course quality, pedagogical design, and learner engagement with the correlation coefficients ranging from 0.445 to 0.632 (p from 0.000 to 0.012 < 0.05). Consistency implies at least the partial trustworthiness of the evaluation of these MOOC platforms by either of these two popular generative AI robots based on the analogous concept of convergent validity for an operationalized instrument to measure an abstract construct.

Original languageEnglish
Pages329-336
Number of pages8
Publication statusPublished - 2023
Event20th International Conference on Cognition and Exploratory Learning in Digital Age, CELDA 2023 - Madeira Island, Portugal
Duration: 21 Oct 202323 Oct 2023

Conference

Conference20th International Conference on Cognition and Exploratory Learning in Digital Age, CELDA 2023
Country/TerritoryPortugal
CityMadeira Island
Period21/10/2323/10/23

Keywords

  • Artificial Intelligence
  • Consistency
  • Evaluation
  • MOOC Platforms
  • Massive Open Online Course Platforms

Fingerprint

Dive into the research topics of 'EVALUATING POPULAR MOOC PLATFORMS BY GENERATIVE ARTIFICIAL INTELLIGENCE (AI) ROBOTS: HOW CONSISTENT ARE THE ROBOTS?'. Together they form a unique fingerprint.

Cite this