TY - JOUR
T1 - Virtual data augmentation method for reaction prediction
AU - Wu, Xinyi
AU - Zhang, Yun
AU - Yu, Jiahui
AU - Zhang, Chengyun
AU - Qiao, Haoran
AU - Wu, Yejian
AU - Wang, Xinqiao
AU - Wu, Zhipeng
AU - Duan, Hongliang
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - To improve the performance of data-driven reaction prediction models, we propose an intelligent strategy for predicting reaction products using available data and increasing the sample size using fake data augmentation. In this research, fake data sets were created and augmented with raw data for constructing virtual training models. Fake reaction datasets were created by replacing some functional groups, i.e., in the data analysis strategy, the fake data as compounds with modified functional groups to increase the amount of data for reaction prediction. This approach was tested on five different reactions, and the results show improvements over other relevant techniques with increased model predictivity. Furthermore, we evaluated this method in different models, confirming the generality of virtual data augmentation. In summary, virtual data augmentation can be used as an effective measure to solve the problem of insufficient data and significantly improve the performance of reaction prediction.
AB - To improve the performance of data-driven reaction prediction models, we propose an intelligent strategy for predicting reaction products using available data and increasing the sample size using fake data augmentation. In this research, fake data sets were created and augmented with raw data for constructing virtual training models. Fake reaction datasets were created by replacing some functional groups, i.e., in the data analysis strategy, the fake data as compounds with modified functional groups to increase the amount of data for reaction prediction. This approach was tested on five different reactions, and the results show improvements over other relevant techniques with increased model predictivity. Furthermore, we evaluated this method in different models, confirming the generality of virtual data augmentation. In summary, virtual data augmentation can be used as an effective measure to solve the problem of insufficient data and significantly improve the performance of reaction prediction.
UR - http://www.scopus.com/inward/record.url?scp=85139778620&partnerID=8YFLogxK
U2 - 10.1038/s41598-022-21524-6
DO - 10.1038/s41598-022-21524-6
M3 - Article
C2 - 36224331
AN - SCOPUS:85139778620
SN - 2045-2322
VL - 12
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 17098
ER -