跳至主導覽 跳至搜尋 跳過主要內容

The Consistency Between Popular Generative Artificial Intelligence (AI) Robots in Evaluating the User Experience of Mobile Device Operating Systems

研究成果: Chapter同行評審

2 引文 斯高帕斯(Scopus)

摘要

This article attempts to study the consistency, among other auxiliary comparisons, between popular generative artificial intelligence (AI) robots in the evaluation of various perceived user experience dimensions of mobile device operating system versions or, more specifically, iOS and Android versions. A handful of robots were experimented with, ending up with Dragonfly and GPT-4 being the only two eligi-ble for in-depth investigation where the duo was individually requested to accord rating scores to the six major dimensions, namely (1) efficiency, (2) effectiveness, (3) learnability, (4) satisfaction, (5) accessibility, and (6) security, of the operating system versions. It is noteworthy that these dimensions are from the perceived user experience’s point of view instead of any “physical” technology’s standpoint. For each of the two robots, the minimum, the maximum, the range, and the standard deviation of the rating scores for each of the six dimensions were computed across all the versions. The rating score difference for each of the six dimensions between the two robots was calculated for each version. The mean of the absolute value, the minimum, the maximum, the range, and the standard deviation of the differences for each dimension between the two robots were calculated across all versions. A paired sample t-test was then applied to each dimension for the rating score differences between the two robots over all the versions. Finally, a correlation coefficient of the rating scores was computed for each dimension between the two robots across all the versions. These computational outcomes were to confirm whether the two robots awarded discrimination in evaluating each dimension across the versions, whether any of the two robots systematically underrated or overrated any dimension vis-à-vis the other robot, and whether there was consistency between the two robots in evaluating each dimension across the versions. It was found that discrimination was apparent in the evaluation of all dimensions, GPT-4 systematically underrated the dimensions satisfaction (p = 0.002 < 0.05) and security (p = 0.008 < 0.05) compared with Dragonfly, and the evaluation by the two robots was almost impeccably consistent for the six dimensions with the correlation coefficients ranging from 0.679 to 0.892 (p from 0.000 to 0.003 < 0.05). Consistency implies at least the partial trustworthiness of the evaluation of these mobile device operating system versions by either of these two popular generative AI robots based on the analogous concept of convergent validity.

原文English
主出版物標題Applied Human Factors and Ergonomics International
發行者AHFE International
頁面205-215
頁數11
DOIs
出版狀態Published - 2023

出版系列

名字Applied Human Factors and Ergonomics International
113
ISSN(電子)2771-0718

指紋

深入研究「The Consistency Between Popular Generative Artificial Intelligence (AI) Robots in Evaluating the User Experience of Mobile Device Operating Systems」主題。共同形成了獨特的指紋。

引用此