TY - JOUR
T1 - XuanHuGPT
T2 - parameter-efficient fine-tuning of large language model in the field of traditional Chinese medicine
AU - Tong, Xuming
AU - Ding, Xiaozheng
AU - Jia, Huiru
AU - Yuan, Yanhong
AU - Liu, Liyan
AU - Wang, Yapeng
AU - Xiong, Zhang
AU - Yang, Xu
AU - Im, Sio Kei
AU - Wang, Mini Han
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Large Language Models (LLMs) have demonstrated exceptional generalization capabilities across various fields, including their application in Traditional Chinese Medicine (TCM). However, the performance of existing LLMs in TCM-specific tasks remains limited due to the lack of optimization for TCM knowledge during the pre-training phase, insufficient datasets, and the constraints of fine-tuning techniques. To address these challenges, this study constructs the XhTCM dataset by systematically integrating data from three authoritative sources—ShenNong_TCM_Dataset, TCMBank, and TCMIP v2.0. The dataset includes 100,000 structured entries, covering classical theories, prescription formulations, herbal pharmacology, and modern clinical practices. Based on this, we present XuanHuGPT, a domain-specific LLM tailored for TCM question answering and inference. By applying Parameter-Efficient Fine-Tuning (PEFT) techniques, we effectively balance model performance and training costs. Furthermore, we establish a comprehensive evaluation framework for TCM LLMs, combining quantitative metrics (BLEU, ROUGE, METEOR, BERTScore, and Embedding Distance) with expert qualitative assessments. Experimental results show that XuanHuGPT significantly outperforms both general-purpose LLMs and some existing TCM-specific models in accuracy, coverage, fluency, consistency, sensitivity, and safety. This study presents a reproducible paradigm for building intelligent TCM Q&A systems, contributing to the digital transformation, intelligent development, and global dissemination of TCM knowledge.
AB - Large Language Models (LLMs) have demonstrated exceptional generalization capabilities across various fields, including their application in Traditional Chinese Medicine (TCM). However, the performance of existing LLMs in TCM-specific tasks remains limited due to the lack of optimization for TCM knowledge during the pre-training phase, insufficient datasets, and the constraints of fine-tuning techniques. To address these challenges, this study constructs the XhTCM dataset by systematically integrating data from three authoritative sources—ShenNong_TCM_Dataset, TCMBank, and TCMIP v2.0. The dataset includes 100,000 structured entries, covering classical theories, prescription formulations, herbal pharmacology, and modern clinical practices. Based on this, we present XuanHuGPT, a domain-specific LLM tailored for TCM question answering and inference. By applying Parameter-Efficient Fine-Tuning (PEFT) techniques, we effectively balance model performance and training costs. Furthermore, we establish a comprehensive evaluation framework for TCM LLMs, combining quantitative metrics (BLEU, ROUGE, METEOR, BERTScore, and Embedding Distance) with expert qualitative assessments. Experimental results show that XuanHuGPT significantly outperforms both general-purpose LLMs and some existing TCM-specific models in accuracy, coverage, fluency, consistency, sensitivity, and safety. This study presents a reproducible paradigm for building intelligent TCM Q&A systems, contributing to the digital transformation, intelligent development, and global dissemination of TCM knowledge.
KW - Large Language Models
KW - LoRA
KW - Parameter-efficient fine-tuning
KW - Traditional Chinese medicine
KW - XuanHuGPT
UR - https://www.scopus.com/pages/publications/105022906930
U2 - 10.1186/s13020-025-01200-3
DO - 10.1186/s13020-025-01200-3
M3 - Article
AN - SCOPUS:105022906930
SN - 1991-0150
VL - 20
JO - Chinese Medicine (United Kingdom)
JF - Chinese Medicine (United Kingdom)
IS - 1
M1 - 204
ER -