Dual retrieving and ranking medical large language model with retrieval augmented generation

Qimin Yang, Huan Zuo, Runqi Su, Hanyinghong Su, Tangyi Zeng, Huimei Zhou, Rongsheng Wang, Jiexin Chen, Yijun Lin, Zhiyi Chen, Tao Tan

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advancements in large language models (LLMs) have significantly enhanced text generation across various sectors; however, their medical application faces critical challenges regarding both accuracy and real-time responsiveness. To address these dual challenges, we propose a novel two-step retrieval and ranking retrieval-augmented generation (RAG) framework that synergistically combines embedding search with Elasticsearch technology. Built upon a dynamically updated medical knowledge base incorporating expert-reviewed documents from leading healthcare institutions, our hybrid architecture employs ColBERTv2 for context-aware result ranking while maintaining computational efficiency. Experimental results show a 10% improvement in accuracy for complex medical queries compared to standalone LLM and single-search RAG variants, while acknowledging that latency challenges remain in emergency situations requiring sub-second responses in an experimental setting, which can be achieved in real-time using more powerful hardware in real-world deployments. This work establishes a new paradigm for reliable medical AI assistants that successfully balances accuracy and practical deployment considerations.

Original languageEnglish
Article number18062
JournalScientific Reports
Volume15
Issue number1
DOIs
Publication statusPublished - Dec 2025

Keywords

  • Artificial intelligence (AI)
  • Medical-large language model
  • Retrieval-augmented generation (RAG)

Fingerprint

Dive into the research topics of 'Dual retrieving and ranking medical large language model with retrieval augmented generation'. Together they form a unique fingerprint.

Cite this