Corpus Database Management Design for Chinese-Portuguese Bidirectional Parallel Corpora

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

As deep learning techniques continue to mature, machine translation (MT) is gaining popularity among translators. However, the accuracy of machine translation depends not only on the size of the parallel corpus but also on the quality of the parallel corpus. The management of these massive parallel corpora is often unaware due to the lack of tools. As a result, many conflicting and confusing parallel corpora are trained together to influence the MT engines. Therefore, this study proposes a novel parallel corpus database design aimed at assisting data management efforts. After a series of experimental tests, our proposed database design can effectively generate domain-specific MT models with better BiLingual Evaluation Understudy (BLEU) values than other models. Furthermore, this database design helps to analyze, validate, and evaluate the quality of parallel corpora in database engines.

原文English
主出版物標題2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
發行者Institute of Electrical and Electronics Engineers Inc.
頁面103-108
頁數6
ISBN(電子)9798350335262
DOIs
出版狀態Published - 2023
事件3rd IEEE International Conference on Computer Communication and Artificial Intelligence, CCAI 2023 - Taiyuan, China
持續時間: 26 5月 202328 5月 2023

出版系列

名字2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence, CCAI 2023

Conference

Conference3rd IEEE International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
國家/地區China
城市Taiyuan
期間26/05/2328/05/23

指紋

深入研究「Corpus Database Management Design for Chinese-Portuguese Bidirectional Parallel Corpora」主題。共同形成了獨特的指紋。

引用此