Corpus Database Management Design for Chinese-Portuguese Bidirectional Parallel Corpora

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

As deep learning techniques continue to mature, machine translation (MT) is gaining popularity among translators. However, the accuracy of machine translation depends not only on the size of the parallel corpus but also on the quality of the parallel corpus. The management of these massive parallel corpora is often unaware due to the lack of tools. As a result, many conflicting and confusing parallel corpora are trained together to influence the MT engines. Therefore, this study proposes a novel parallel corpus database design aimed at assisting data management efforts. After a series of experimental tests, our proposed database design can effectively generate domain-specific MT models with better BiLingual Evaluation Understudy (BLEU) values than other models. Furthermore, this database design helps to analyze, validate, and evaluate the quality of parallel corpora in database engines.

Original languageEnglish
Title of host publication2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages103-108
Number of pages6
ISBN (Electronic)9798350335262
DOIs
Publication statusPublished - 2023
Event3rd IEEE International Conference on Computer Communication and Artificial Intelligence, CCAI 2023 - Taiyuan, China
Duration: 26 May 202328 May 2023

Publication series

Name2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence, CCAI 2023

Conference

Conference3rd IEEE International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
Country/TerritoryChina
CityTaiyuan
Period26/05/2328/05/23

Keywords

  • Corpus Management
  • Data Engineering
  • Intelligent Database Systems
  • Natural Language Processing
  • Parallel Corpora

Fingerprint

Dive into the research topics of 'Corpus Database Management Design for Chinese-Portuguese Bidirectional Parallel Corpora'. Together they form a unique fingerprint.

Cite this