Abstract
The rapid expansion of enzyme reaction literature has created a major bottleneck in database curation, leaving vast amounts of enzyme-substrate-condition relationships unstructured and inaccessible for DL-driven modeling. How to fully utilize the enzymatic reaction data has been an important task for future accurate enzyme activity prediction models. Current deep learning (DL)-based data extraction models heavily rely on large language models (LLMs) without a fidelity check and the ability to continuously evolve. To address these issues, we developed zERExtractor (Zelixir's Enzyme Reaction Data Extractor), an accuracy-oriented and extensible platform for extracting enzyme-catalyzed reaction data from scientific publications. This system offers a unified multimodal information extraction framework (covering molecular reaction diagrams, tables, and texts) to integrate enzymatic reaction descriptors into structured storage. We employ fine-tuned large LLMs together with DL in a human-in-the-loop pipeline that evolves through data fidelity validation by experts and active learning. Also, zERExtractor achieves 89.9% accuracy in table recognition and over 98% accuracy in molecular image recognition on synthetic data sets, outperforming the strongest baseline by more than 2% and consistently maintaining above 95% on realistic benchmarks. zERExtractor bridges the data gap in enzyme reaction data with a scalable framework for accurate multimodal extraction, advancing DL-driven enzyme modeling and enabling future applications in computational enzymology and biotechnology. The platform is publicly accessible online at https://zpaper.zelixir.com/.
| Original language | English |
|---|---|
| Pages (from-to) | 4296-4309 |
| Number of pages | 14 |
| Journal | Journal of Chemical Information and Modeling |
| Volume | 66 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - 13 Apr 2026 |
Fingerprint
Dive into the research topics of 'zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver