TY - JOUR
T1 - Self-Supervised Molecular Pretraining Strategy for Low-Resource Reaction Prediction Scenarios
AU - Wu, Zhipeng
AU - Cai, Xiang
AU - Zhang, Chengyun
AU - Qiao, Haoran
AU - Wu, Yejian
AU - Zhang, Yun
AU - Wang, Xinqiao
AU - Xie, Haiying
AU - Luo, Feng
AU - Duan, Hongliang
N1 - Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show that the average improved accuracies of the Transformer model can reach 14.07, 24.26, 40.31, and 57.69% in predicting the Baeyer-Villiger, Heck, C-C bond formation, and functional group interconversion reaction data sets, respectively, marking an important step to low-resource reaction prediction.
AB - In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show that the average improved accuracies of the Transformer model can reach 14.07, 24.26, 40.31, and 57.69% in predicting the Baeyer-Villiger, Heck, C-C bond formation, and functional group interconversion reaction data sets, respectively, marking an important step to low-resource reaction prediction.
UR - http://www.scopus.com/inward/record.url?scp=85138980238&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.2c00588
DO - 10.1021/acs.jcim.2c00588
M3 - Article
C2 - 36129104
AN - SCOPUS:85138980238
SN - 1549-9596
VL - 62
SP - 4579
EP - 4590
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 19
ER -