Abstract
Generative Artificial Intelligence (GenAI), based on large language models (LLMs), excels in various language comprehension tasks and is increasingly utilized in applied linguistics research. This study examines the accuracy and methodological implications of using LLMs to classify open-ended responses from learners. We surveyed 143 Japanese university students studying English as a foreign language (EFL) about their essay-writing process. Two human evaluators independently classified the students’ responses based on self-regulated learning processes: planning, monitoring, and evaluation. At the same time, several LLMs performed the same classification task, and their results were compared with those of the human evaluators using Cohen's kappa coefficient. We established κ ≥ 0.8 as the threshold for strong agreement based on rigorous methodological standards. Our findings revealed that even the best-performing model (DeepSeek-V3) achieved only moderate agreement (κ = 0.68), while other models demonstrated fair-to-moderate agreement (κ = 0.37–0.61). Surprisingly, open-source models outperformed several commercial counterparts. These results highlight the necessity of expert oversight when integrating GenAI as a support tool in qualitative data analysis. The paper concludes by discussing the methodological implications for using LLMs in qualitative research and proposing specific prompt engineering strategies to enhance their reliability in applied linguistics.
| Original language | English |
|---|---|
| Article number | 100210 |
| Journal | Research Methods in Applied Linguistics |
| Volume | 4 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Aug 2025 |
Keywords
- Coding and classification
- Generative AI
- Large language models (LLM)
- Qualitative analysis
- Research methods
Fingerprint
Dive into the research topics of 'Large language models fall short in classifying learners’ open-ended responses'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver