Retrosynthesis with attention-based NMT model and chemical analysis of "wrong" predictions

Hongliang Duan, Ling Wang, Chengyun Zhang, Lin Guo, Jianjun Li

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)


We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists select and analyze the "wrong" predictions that may be chemically plausible but differ from the ground truth. The effectiveness of our model is found to be underestimated and the "true" top-1 accuracy reaches as high as 64.6%.

Original languageEnglish
Pages (from-to)1371-1378
Number of pages8
JournalRSC Advances
Issue number3
Publication statusPublished - 2020
Externally publishedYes


Dive into the research topics of 'Retrosynthesis with attention-based NMT model and chemical analysis of "wrong" predictions'. Together they form a unique fingerprint.

Cite this