Research on Plant RNA-Binding Protein Prediction Method Based on Improved Ensemble Learning

Hongwei Zhang, Yan Shi, Yapeng Wang, Xu Yang, Kefeng Li, Sio Kei Im, Yu Han

Research output: Contribution to journalArticlepeer-review

Abstract

(1) RNA-binding proteins (RBPs) play a crucial role in regulating gene expression in plants, affecting growth, development, and stress responses. Accurate prediction of plant-specific RBPs is vital for understanding gene regulation and enhancing genetic improvement. (2) Methods: We propose an ensemble learning method that integrates shallow and deep learning. It integrates prediction results from SVM, LR, LDA, and LightGBM into an enhanced TextCNN, using K-Peptide Composition (KPC) encoding (k = 1, 2) to form a 420-dimensional feature vector, extended to 424 dimensions by including those four prediction outputs. Redundancy is minimized using a Pearson correlation threshold of 0.80. (3) Results: On the benchmark dataset of 4992 sequences, our method achieved an ACC of 97.20% and 97.06% under 5-fold and 10-fold cross-validation, respectively. On an independent dataset of 1086 sequences, our method attained an ACC of 99.72%, an (Formula presented.) of 99.72%, an MCC of 99.45%, an SN of 99.63%, and an SP of 99.82%, outperforming RBPLight by 12.98 percentage points in ACC and the original TextCNN by 25.23 percentage points. (4) Conclusions: These results highlight our method’s superior accuracy and efficiency over PSSM-based approaches, enabling large-scale plant RBP prediction.

Original languageEnglish
Article number672
JournalBiology
Volume14
Issue number6
DOIs
Publication statusPublished - Jun 2025

Keywords

  • ensemble learning
  • plant
  • RBPs
  • RNA-binding proteins
  • TextCNN

Fingerprint

Dive into the research topics of 'Research on Plant RNA-Binding Protein Prediction Method Based on Improved Ensemble Learning'. Together they form a unique fingerprint.

Cite this