TY - JOUR
T1 - Benchmarking AI-powered docking methods from the perspective of virtual screening
AU - Gu, Shukai
AU - Shen, Chao
AU - Zhang, Xujun
AU - Sun, Huiyong
AU - Cai, Heng
AU - Luo, Hao
AU - Zhao, Huifeng
AU - Liu, Bo
AU - Du, Hongyan
AU - Zhao, Yihao
AU - Fu, Chenggong
AU - Zhai, Silong
AU - Deng, Yafeng
AU - Liu, Huanxiang
AU - Hou, Tingjun
AU - Kang, Yu
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Limited 2025.
PY - 2025
Y1 - 2025
N2 - Recently, many artificial intelligence (AI)-powered protein–ligand docking and scoring methods have been developed, demonstrating impressive speed and accuracy. However, these methods often neglected the physical plausibility of the docked complexes and their efficacy in virtual screening (VS) projects. Therefore, we conducted a comprehensive benchmark analysis of four AI-powered and four physics-based docking tools and two AI-enhanced rescoring methods. We initially constructed the TrueDecoy set, a dataset on which the redocking experiments revealed that KarmaDock and CarsiDock surpassed all physics-based tools in docking accuracy, whereas all physics-based tools notably outperformed AI-based methods in structural rationality. The low physical plausibility of docked structures generated by the top AI method, CarsiDock, mainly stems from insufficient intermolecular validity. The VS results on the TrueDecoy set highlight the effectiveness of RTMScore as a rescore function, and Glide-based methods achieved the highest enrichment factors among all docking tools. Furthermore, we created the RandomDecoy set, a dataset that more closely resembles real-world VS scenarios, where AI-based tools obviously outperformed Glide. Additionally, we found that the employed ligand-based postprocessing methods had a weak or even negative impact on optimizing the conformations of docked complexes and enhancing VS performance. Finally, we proposed a hierarchical VS strategy that could efficiently and accurately enrich active molecules in large-scale VS projects.
AB - Recently, many artificial intelligence (AI)-powered protein–ligand docking and scoring methods have been developed, demonstrating impressive speed and accuracy. However, these methods often neglected the physical plausibility of the docked complexes and their efficacy in virtual screening (VS) projects. Therefore, we conducted a comprehensive benchmark analysis of four AI-powered and four physics-based docking tools and two AI-enhanced rescoring methods. We initially constructed the TrueDecoy set, a dataset on which the redocking experiments revealed that KarmaDock and CarsiDock surpassed all physics-based tools in docking accuracy, whereas all physics-based tools notably outperformed AI-based methods in structural rationality. The low physical plausibility of docked structures generated by the top AI method, CarsiDock, mainly stems from insufficient intermolecular validity. The VS results on the TrueDecoy set highlight the effectiveness of RTMScore as a rescore function, and Glide-based methods achieved the highest enrichment factors among all docking tools. Furthermore, we created the RandomDecoy set, a dataset that more closely resembles real-world VS scenarios, where AI-based tools obviously outperformed Glide. Additionally, we found that the employed ligand-based postprocessing methods had a weak or even negative impact on optimizing the conformations of docked complexes and enhancing VS performance. Finally, we proposed a hierarchical VS strategy that could efficiently and accurately enrich active molecules in large-scale VS projects.
UR - http://www.scopus.com/inward/record.url?scp=85218825024&partnerID=8YFLogxK
U2 - 10.1038/s42256-025-00993-0
DO - 10.1038/s42256-025-00993-0
M3 - Article
AN - SCOPUS:85218825024
SN - 2522-5839
JO - Nature Machine Intelligence
JF - Nature Machine Intelligence
M1 - e1003571
ER -