Skip to main navigation Skip to search Skip to main content

The Convergent Validity of Computer Operating Systems’ Usability Evaluation by Popular Generative Artificial Intelligence (AI) Robots

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

This article seeks to examine the convergent validity of (and thus the consistency between) computer operating systems’ (OSs’) usability evaluation by a number of popular generative artificial intelligence (AI) robots. Totally 18 popular OS versions were included in the study, they specifically being the various versions of the three leading OS families of Windows, macOS, and Linux. Usability was evaluated in eight major dimensions, namely, (1) effectiveness, (2) efficiency, (3) learnability, (4) memorability, (5) safety, (6) utility, (7) ergonomics, and (8) accessibility. Experimenting with a handful of generative AI robots, Microsoft’s Copilot, Google’s PaLM, and Meta’s Llama managed to individually accord rating scores to the aforementioned eight dimensions. For each robot of this trio, the minimum, the maximum, the range, and the standard deviation of the rating scores for each of the eight dimensions were computed across the OS versions. The rating score difference for each of the eight dimensions between each pair of these robots was calculated for each OS version. The mean of the absolute value, the minimum, the maximum, the range, and the standard deviation of the differences for each dimension between each robot pair were calculated across the OS versions. A paired sample t-test was then applied to each dimension for the rating score difference between each robot pair over the versions. Finally, Cronbach’s coefficient alpha (α) of the rating scores was computed for each dimension between all the three robots across the versions. These computational outcomes were to affirm whether each robot awarded discrimination in evaluating each dimension across the OS versions, whether each robot vis-à-vis any other robots erratically and/or systematically overrate or underrate any dimension over the OS versions, and whether there was high convergent validity of (and thus consistency between) all the three robots in evaluating each dimension across the OS versions. Among other ancillary results, it was found that the convergent validity of the three robots in evaluating all the eight dimensions was high, and thus such evaluation is trustworthy at least to an extent.

Original languageEnglish
Title of host publicationApplied Human Factors and Ergonomics International
PublisherAHFE International
Pages305-315
Number of pages11
DOIs
Publication statusPublished - 2024

Publication series

NameApplied Human Factors and Ergonomics International
Volume120
ISSN (Electronic)2771-0718

Keywords

  • Artificial intelligence
  • Computer operating system versions
  • Convergent validity
  • Robots
  • Usability

Fingerprint

Dive into the research topics of 'The Convergent Validity of Computer Operating Systems’ Usability Evaluation by Popular Generative Artificial Intelligence (AI) Robots'. Together they form a unique fingerprint.

Cite this