1. Paper title

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

2. link

https://www.aclweb.org/anthology/2020.acl-main.252.pdf

3. 摘要

Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with selfsupervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of lowresource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 33 BLEU on WMT ro-en translation without any parallel data or back-translation.

4. 要解决什么问题

多语言NMT能解决low-resource问题和zero-shot问题，因为它可以从容易获取的数据上学习并迁移。

有两种方法来处理NMT的low-resource问题：
（1）基于多语言NMT，利用high-resource语言来提升low-resource语言的质量。
（2）在基于单语言数据的预训练模型上用少量数据做调优。

5. 作者的主要贡献

将以上两种方法相结合，多语NMT + 单语self-supervised
本文方法的特点：

不需要多语言数据
对所有语言有提升，尤其是low-resource
zero-shot翻译有提升
没见过的新语言，只需要少量单语数据，也有比较好的效果

6. 得到了什么结果

（1）用方法一能显著提升代码质量
（2）用方法二能处理zero-shot问题
（3）两种方法的结合是可行的。

2020.acl-main.252

1. Paper title

2. link

3. 摘要

4. 要解决什么问题

5. 作者的主要贡献

6. 得到了什么结果

7. 关键字

results matching ""

No results matching ""