跳至主導覽 跳至搜尋 跳過主要內容

Reasoning or not? A comprehensive evaluation of reasoning LLMs for dialogue summarization

  • Macao Polytechnic University
  • University of Coimbra
  • Polytechnic Institute of Leiria
  • Institute of International Language Services Studies

研究成果: Article同行評審

8 引文 斯高帕斯(Scopus)

摘要

Despite the rapid progress in reasoning Large Language Models, their efficacy in dialogue summarization remains a critical, underexplored area, as this task requires a delicate balance of abstraction, faithfulness, and conciseness. To address this gap, we present the first large-scale, systematic evaluation of leading reasoning LLMs against their direct non-reasoning counterparts. Our rigorous framework covers three core paradigms of generic, role-oriented, and query-oriented summarization, and is tested on four diverse benchmark datasets spanning multiple languages and contexts. Our multi-perspective evaluation consistently demonstrates that, rather than conferring an advantage, the explicit reasoning processes in current models often hinder summarization quality. We find that reasoning models systematically produce longer, less faithful summaries that exhibit higher novelty but lower source coverage, deviating significantly from human summarization styles. Moving beyond performance metrics, we provide a deep diagnostic of the root causes for these failures through a novel, human-annotated error analysis. We identify a critical trade-off where one class of models suffers from structural inefficiency, characterized by verbose and redundant reasoning, while another, though more concise, is prone to multifaceted errors involving logical and factual fallacies. These findings reveal a fundamental conflict between the verbose, step-by-step nature of current reasoning architectures and the high-level abstraction required for summarization, offering crucial insights for designing future models that can effectively bridge logical deduction with concise synthesis.

原文English
文章編號129831
期刊Expert Systems with Applications
299
DOIs
出版狀態Published - 1 3月 2026

指紋

深入研究「Reasoning or not? A comprehensive evaluation of reasoning LLMs for dialogue summarization」主題。共同形成了獨特的指紋。

引用此