Abstract
The phenomenon of zero pronoun (ZP) has attracted increasing interest in the machine translation community due to its importance and difficulty. However, previous studies generally evaluate the quality of translating ZPs with BLEU score on MT testsets, which is not expressive or sensitive enough for accurate assessment. To bridge the data and evaluation gaps, we propose a benchmark testset and evaluation metric for target evaluation on Chinese ZP translation. The human-annotated testset covers five challenging genres, which reveal different characteristics of ZPs for comprehensive evaluation. We systematically revisit advanced models on ZP translation and identify current challenges for future exploration. We release data, code, and trained models, which we hope can significantly promote research in this field.
Original language | English |
---|---|
Pages (from-to) | 1263-1293 |
Number of pages | 31 |
Journal | Language Resources and Evaluation |
Volume | 57 |
Issue number | 3 |
DOIs | |
Publication status | Published - Sept 2023 |
Keywords
- Benchmark dataset
- Discourse
- Evaluation metric
- Machine translation
- Zero pronoun