Skip to main navigation Skip to search Skip to main content

zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature

  • Rui Zhou
  • , Haohui Ma
  • , Tianle Xin
  • , Qiuchen Miao
  • , Lixin Zou
  • , Qiuyue Hu
  • , Hongxi Cheng
  • , Jingjing Guo
  • , Yuguang Mu
  • , Sheng Wang
  • , Guoqing Zhang
  • , Yanjie Wei
  • , Liangzhen Zheng
  • Shenzhen Institute of Advanced Technology
  • University of Chinese Academy of Sciences
  • Shanghai Zelixir Biotech Company Ltd.
  • CAS - Shanghai Institute of Nutrition and Health
  • Ltd.
  • Nanyang Technological University
  • Shenzhen University of Advanced Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The rapid expansion of enzyme reaction literature has created a major bottleneck in database curation, leaving vast amounts of enzyme-substrate-condition relationships unstructured and inaccessible for DL-driven modeling. How to fully utilize the enzymatic reaction data has been an important task for future accurate enzyme activity prediction models. Current deep learning (DL)-based data extraction models heavily rely on large language models (LLMs) without a fidelity check and the ability to continuously evolve. To address these issues, we developed zERExtractor (Zelixir's Enzyme Reaction Data Extractor), an accuracy-oriented and extensible platform for extracting enzyme-catalyzed reaction data from scientific publications. This system offers a unified multimodal information extraction framework (covering molecular reaction diagrams, tables, and texts) to integrate enzymatic reaction descriptors into structured storage. We employ fine-tuned large LLMs together with DL in a human-in-the-loop pipeline that evolves through data fidelity validation by experts and active learning. Also, zERExtractor achieves 89.9% accuracy in table recognition and over 98% accuracy in molecular image recognition on synthetic data sets, outperforming the strongest baseline by more than 2% and consistently maintaining above 95% on realistic benchmarks. zERExtractor bridges the data gap in enzyme reaction data with a scalable framework for accurate multimodal extraction, advancing DL-driven enzyme modeling and enabling future applications in computational enzymology and biotechnology. The platform is publicly accessible online at https://zpaper.zelixir.com/.

Original languageEnglish
Pages (from-to)4296-4309
Number of pages14
JournalJournal of Chemical Information and Modeling
Volume66
Issue number7
DOIs
Publication statusPublished - 13 Apr 2026

Fingerprint

Dive into the research topics of 'zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature'. Together they form a unique fingerprint.

Cite this