跳至主導覽 跳至搜尋 跳過主要內容

Biological Sequence Representation Methods and Recent Advances: A Review

  • Macao Polytechnic University
  • Beijing University of Posts and Telecommunications
  • Southwest Forestry University

研究成果: Review article同行評審

2 引文 斯高帕斯(Scopus)

摘要

Biological-sequence representation methods are pivotal for advancing machine learning in computational biology, transforming nucleotide and protein sequences into formats that enhance predictive modeling and downstream task performance. This review categorizes these methods into three developmental stages: computational-based, word embedding-based, and large language model (LLM)-based, detailing their principles, applications, and limitations. Computational-based methods, such as k-mer counting and position-specific scoring matrices (PSSM), extract statistical and evolutionary patterns to support tasks like motif discovery and protein–protein interaction prediction. Word embedding-based approaches, including Word2Vec and GloVe, capture contextual relationships, enabling robust sequence classification and regulatory element identification. Advanced LLM-based methods, leveraging Transformer architectures like ESM3 and RNAErnie, model long-range dependencies for RNA structure prediction and cross-modal analysis, achieving superior accuracy. However, challenges persist, including computational complexity, sensitivity to data quality, and limited interpretability of high-dimensional embeddings. Future directions prioritize integrating multimodal data (e.g., sequences, structures, and functional annotations), employing sparse attention mechanisms to enhance efficiency, and leveraging explainable AI to bridge embeddings with biological insights. These advancements promise transformative applications in drug discovery, disease prediction, and genomics, empowering computational biology with robust, interpretable tools.

原文English
文章編號1137
期刊Biology
14
發行號9
DOIs
出版狀態Published - 9月 2025

指紋

深入研究「Biological Sequence Representation Methods and Recent Advances: A Review」主題。共同形成了獨特的指紋。

引用此