Dual retrieving and ranking medical large language model with retrieval augmented generation

  • Qimin Yang
  • , Huan Zuo
  • , Runqi Su
  • , Hanyinghong Su
  • , Tangyi Zeng
  • , Huimei Zhou
  • , Rongsheng Wang
  • , Jiexin Chen
  • , Yijun Lin
  • , Zhiyi Chen
  • , Tao Tan

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Recent advancements in large language models (LLMs) have significantly enhanced text generation across various sectors; however, their medical application faces critical challenges regarding both accuracy and real-time responsiveness. To address these dual challenges, we propose a novel two-step retrieval and ranking retrieval-augmented generation (RAG) framework that synergistically combines embedding search with Elasticsearch technology. Built upon a dynamically updated medical knowledge base incorporating expert-reviewed documents from leading healthcare institutions, our hybrid architecture employs ColBERTv2 for context-aware result ranking while maintaining computational efficiency. Experimental results show a 10% improvement in accuracy for complex medical queries compared to standalone LLM and single-search RAG variants, while acknowledging that latency challenges remain in emergency situations requiring sub-second responses in an experimental setting, which can be achieved in real-time using more powerful hardware in real-world deployments. This work establishes a new paradigm for reliable medical AI assistants that successfully balances accuracy and practical deployment considerations.

Original languageEnglish
Article number18062
JournalScientific Reports
Volume15
Issue number1
DOIs
Publication statusPublished - Dec 2025

Keywords

  • Artificial intelligence (AI)
  • Medical-large language model
  • Retrieval-augmented generation (RAG)

Fingerprint

Dive into the research topics of 'Dual retrieving and ranking medical large language model with retrieval augmented generation'. Together they form a unique fingerprint.

Cite this