Abstract
Motivation: Epstein-Barr virus (EBV) is one of the most prevalent DNA oncogenic viruses. The integration of EBV into the host genome has been reported to play an important role in cancer development. The preference of EBV integration showed strong dependence on the local genomic environment, which enables the prediction of EBV integration sites. Results: An attention-based deep learning model, DeepEBV, was developed to predict EBV integration sites by learning local genomic features automatically. First, DeepEBV was trained and tested using the data from the dsVIS database. The results showed that DeepEBV with EBV integration sequences plus Repeat peaks and 2-fold data augmentation performed the best on the training dataset. Furthermore, the performance of the model was validated in an independent dataset. In addition, the motifs of DNA-binding proteins could influence the selection preference of viral insertional mutagenesis. Furthermore, the results showed that DeepEBV can predict EBV integration hotspot genes accurately. In summary, DeepEBV is a robust, accurate and explainable deep learning model, providing novel insights into EBV integration preferences and mechanisms.
Original language | English |
---|---|
Pages (from-to) | 3405-3411 |
Number of pages | 7 |
Journal | Bioinformatics |
Volume | 37 |
Issue number | 20 |
DOIs | |
Publication status | Published - 15 Oct 2021 |