Constructing High Quality Bilingual Corpus using Parallel Data from the Web

研究成果: Conference contribution同行評審

2 引文 斯高帕斯(Scopus)

摘要

Natural language machine translation system requires a high-quality bilingual corpus to support its efficient translation operation at high accuracy rate. In this paper, we propose a bilingual corpus construction method using parallel data from the Web. It acts as a stimulus to significantly speed up the construction. In our proposal, there are 4 phases. Parallel data is first pre-processed and refined into three sets of data for training the CNN model. Using the well-trained model, future parallel data can be selected, classified and added to the bilingual corpus. The training result showed that the test accuracy reached 98.46%. Furthermore, the result on precision, recall and f1-score is greater than 0.9, which outperforms RNN and LSTM models.

原文English
主出版物標題IoTBDS 2022 - Proceedings of the 7th International Conference on Internet of Things, Big Data and Security
編輯Denis Bastieri, Gary Wills, Peter Kacsuk, Victor Chang
發行者Science and Technology Publications, Lda
頁面127-132
頁數6
ISBN(電子)9789897585647
DOIs
出版狀態Published - 2022
事件7th International Conference on Internet of Things, Big Data and Security, IoTBDS 2022 - Virtual, Online
持續時間: 22 4月 202224 4月 2022

出版系列

名字International Conference on Internet of Things, Big Data and Security, IoTBDS - Proceedings
2022-April
ISSN(電子)2184-4976

Conference

Conference7th International Conference on Internet of Things, Big Data and Security, IoTBDS 2022
城市Virtual, Online
期間22/04/2224/04/22

指紋

深入研究「Constructing High Quality Bilingual Corpus using Parallel Data from the Web」主題。共同形成了獨特的指紋。

引用此