TY - JOUR
T1 - Exploring GenAI as Evaluative and Formative Assessment Tools in Reading Assessment
T2 - A Mixed-methods Analysis of Genre-based Feedback
AU - Ziqi, Chen
AU - Wei, Wei
AU - Sheng, Chang
AU - Xueyan, Cao
N1 - Publisher Copyright:
© 2025 Chen Ziqi, Wei Wei, Chang Sheng, Cao Xueyan.
PY - 2025/9/2
Y1 - 2025/9/2
N2 - This study explores the potential of Generative AI (GenAI) chatbots as assessment tools in Computer-Assisted Language Learning (CALL) environments for assessing first-language (L1) reading comprehension, focusing on their effectiveness in providing feedback across three reading genres: classical literature, technical writing, and modern fiction. Using a mixed-methods approach, 360 students' responses to constructed-response items in reading assessments from junior secondary students in China were analyzed, comparing GenAI-generated scores and feedback to those provided by human evaluators. Six expert teachers further assessed the quality of the chatbot’s evaluative and revision feedback. Results indicated that GenAI exhibited a significantly stronger alignment with human raters in scoring low-level responses but struggled with high-level samples. Among the genres, interview data suggested that revision feedback for technical writing received the highest ratings for its clarity, rationality, and actionable recommendations. In contrast, feedback for classical literature was often overly complex for junior-level learners and lacked alignment with examination rubrics. For fiction, GenAI struggled with interpretive nuance, thematic complexity, and variability in question types, highlighting its limitations in fostering deep critical literary analysis. This study highlights the genre-specific strengths and limitations of GenAI in supporting reading comprehension.
AB - This study explores the potential of Generative AI (GenAI) chatbots as assessment tools in Computer-Assisted Language Learning (CALL) environments for assessing first-language (L1) reading comprehension, focusing on their effectiveness in providing feedback across three reading genres: classical literature, technical writing, and modern fiction. Using a mixed-methods approach, 360 students' responses to constructed-response items in reading assessments from junior secondary students in China were analyzed, comparing GenAI-generated scores and feedback to those provided by human evaluators. Six expert teachers further assessed the quality of the chatbot’s evaluative and revision feedback. Results indicated that GenAI exhibited a significantly stronger alignment with human raters in scoring low-level responses but struggled with high-level samples. Among the genres, interview data suggested that revision feedback for technical writing received the highest ratings for its clarity, rationality, and actionable recommendations. In contrast, feedback for classical literature was often overly complex for junior-level learners and lacked alignment with examination rubrics. For fiction, GenAI struggled with interpretive nuance, thematic complexity, and variability in question types, highlighting its limitations in fostering deep critical literary analysis. This study highlights the genre-specific strengths and limitations of GenAI in supporting reading comprehension.
KW - Generative AI
KW - genre-based feedback
KW - reading assessment
KW - reading comprehension
UR - https://www.scopus.com/pages/publications/105022503559
M3 - Article
AN - SCOPUS:105022503559
SN - 2187-9036
VL - 26
SP - 378
EP - 395
JO - CALL-EJ
JF - CALL-EJ
IS - 4
ER -