Improving Simultaneous Translation with Pseudo References
Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large scale and publicly available simultaneous translation datasets, most simultaneous translation systems still train with ordinary full-sentence parallel corpora which are not suitable for the simultaneous scenario due to the existence of unnecessary long-distance reorderings. Instead of expensive, time-consuming annotation, we propose a novel method that rewrites the target side of existing full-sentence corpus into simultaneous-style translation. Experiments on Chinese-to-English translation demonstrate about +2.7 BLEU improvements with the addition of newly generated pseudo references.
Chen, J., Zheng, R., Kita, A., Ma, M., & Huang, L. (2020). Improving Simultaneous Translation with Pseudo References. arXiv preprint arXiv:2010.11247. https://arxiv.org/pdf/2010.11247.pdf