TY - GEN
T1 - Corpus Database Management Design for Chinese-Portuguese Bidirectional Parallel Corpora
AU - Hoi, Lap Man
AU - Ke, Wei
AU - Im, Sio Kei
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - As deep learning techniques continue to mature, machine translation (MT) is gaining popularity among translators. However, the accuracy of machine translation depends not only on the size of the parallel corpus but also on the quality of the parallel corpus. The management of these massive parallel corpora is often unaware due to the lack of tools. As a result, many conflicting and confusing parallel corpora are trained together to influence the MT engines. Therefore, this study proposes a novel parallel corpus database design aimed at assisting data management efforts. After a series of experimental tests, our proposed database design can effectively generate domain-specific MT models with better BiLingual Evaluation Understudy (BLEU) values than other models. Furthermore, this database design helps to analyze, validate, and evaluate the quality of parallel corpora in database engines.
AB - As deep learning techniques continue to mature, machine translation (MT) is gaining popularity among translators. However, the accuracy of machine translation depends not only on the size of the parallel corpus but also on the quality of the parallel corpus. The management of these massive parallel corpora is often unaware due to the lack of tools. As a result, many conflicting and confusing parallel corpora are trained together to influence the MT engines. Therefore, this study proposes a novel parallel corpus database design aimed at assisting data management efforts. After a series of experimental tests, our proposed database design can effectively generate domain-specific MT models with better BiLingual Evaluation Understudy (BLEU) values than other models. Furthermore, this database design helps to analyze, validate, and evaluate the quality of parallel corpora in database engines.
KW - Corpus Management
KW - Data Engineering
KW - Intelligent Database Systems
KW - Natural Language Processing
KW - Parallel Corpora
UR - http://www.scopus.com/inward/record.url?scp=85169297132&partnerID=8YFLogxK
U2 - 10.1109/CCAI57533.2023.10201319
DO - 10.1109/CCAI57533.2023.10201319
M3 - Conference contribution
AN - SCOPUS:85169297132
T3 - 2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
SP - 103
EP - 108
BT - 2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Computer Communication and Artificial Intelligence, CCAI 2023
Y2 - 26 May 2023 through 28 May 2023
ER -