TY - GEN
T1 - Code Retrieval with Mixture of Experts Prototype Learning Based on Classification
AU - Ling, Feng
AU - Huang, Guoheng
AU - Wang, Jingchao
AU - Yuan, Xiaochen
AU - Chen, Xuhang
AU - Zhang, Xue Yong
AU - Zhang, Fanlong
AU - Pun, Chi Man
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/10/27
Y1 - 2025/10/27
N2 - The semantic connection between code and queries is crucial for code retrieval, but many human-written queries fail to accurately capture the code’s core intent, leading to ambiguity. This ambiguity complicates the code search process, as the queries do not provide a clear overview of the code’s purpose. Our analysis reveals that while ambiguous queries may not precisely summarize the intent of the code, they often share the same general topics as the corresponding code. In light of this discovery, we propose Code Retrieval with Mixture of Experts Prototype Learning Based on Classification (CRME), a novel approach that combines classification for prototype-based representation learning and result ensembling. CRME utilizes specialized pre-trained models focused on the specific domains of ambiguous queries. It consists of two key components: Multiple Classification Prototype and Representation Learning with a Prototype-based Multi-model Contrastive (PMC) Loss during training, and Multi-Prototype Mixture of Experts Integration (MP-MoE) module for fine-grained ensemble inference. Our method can effectively address the issue of query ambiguity and improves search precision. Experimental results on the CodeSearchNet dataset, covering six sub-datasets, show that CRME outperforms existing methods, achieving an average MRR score of 81.4%. When applied to pre-trained models like CodeBERT, GraphCodeBERT, UniXcoder and CodeT5+, CRME can effectively boosts their performances.
AB - The semantic connection between code and queries is crucial for code retrieval, but many human-written queries fail to accurately capture the code’s core intent, leading to ambiguity. This ambiguity complicates the code search process, as the queries do not provide a clear overview of the code’s purpose. Our analysis reveals that while ambiguous queries may not precisely summarize the intent of the code, they often share the same general topics as the corresponding code. In light of this discovery, we propose Code Retrieval with Mixture of Experts Prototype Learning Based on Classification (CRME), a novel approach that combines classification for prototype-based representation learning and result ensembling. CRME utilizes specialized pre-trained models focused on the specific domains of ambiguous queries. It consists of two key components: Multiple Classification Prototype and Representation Learning with a Prototype-based Multi-model Contrastive (PMC) Loss during training, and Multi-Prototype Mixture of Experts Integration (MP-MoE) module for fine-grained ensemble inference. Our method can effectively address the issue of query ambiguity and improves search precision. Experimental results on the CodeSearchNet dataset, covering six sub-datasets, show that CRME outperforms existing methods, achieving an average MRR score of 81.4%. When applied to pre-trained models like CodeBERT, GraphCodeBERT, UniXcoder and CodeT5+, CRME can effectively boosts their performances.
KW - Code Retrieval
KW - Mixture of Experts
KW - Prototype Learning
UR - https://www.scopus.com/pages/publications/105023704655
U2 - 10.1145/3755881.3755893
DO - 10.1145/3755881.3755893
M3 - Conference contribution
AN - SCOPUS:105023704655
T3 - 16th International Conference on Internetware, Internetware 2025 - Proceedings
SP - 47
EP - 58
BT - 16th International Conference on Internetware, Internetware 2025 - Proceedings
A2 - Mei, Hong
A2 - Lv, Jian
A2 - Jin, Zhi
A2 - Li, Xuandong
A2 - Zimmermann, Thomas
A2 - Li, Ge
A2 - Bu, Lei
A2 - Xia, Xin
PB - Association for Computing Machinery, Inc
T2 - 16th International Conference on Internetware, Internetware 2025
Y2 - 20 June 2025 through 22 June 2025
ER -