1. Paper title

Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation

2. link

https://www.aclweb.org/anthology/2020.acl-main.36.pdf

3. 摘要

The masked language model has received remarkable attention due to its effectiveness on various natural language processing tasks. However, few works have adopted this technique in the sequence-to-sequence models. In this work, we introduce a jointly masked sequence-to-sequence model and explore its application on non-autoregressive neural machine translation (NAT). Specifically, we first empirically study the functionalities of the encoder and the decoder in NAT models, and find that the encoder takes a more important role than the decoder regarding the translation quality. Therefore, we propose to train the encoder more rigorously by masking the encoder input while training. As for the decoder, we propose to train it based on the consecutive masking of the decoder input with an n-gram loss function to alleviate the problem of translating duplicate words. The two types of masks are applied to the model jointly at the training stage. We conduct experiments on five benchmark machine translation tasks, and our model can achieve 27.69/32.24 BLEU scores on WMT14 English-German/German-English tasks with 5+ times speed up compared with an autoregressive model. [?] masked language model

non-autoregressive neural machine translation (NAT):非自回归NMT rigorously :严格地

4. 要解决什么问题

自回归NMT生成target sentence的特点是:word by word、from left to right。因此自回归NMT速度慢。
翻译效率问题是研究热点。
非自回归NMT可以提升推断速度,但降低翻译质量。
NAT翻译质量不好的原因是:

  1. 源信息没有充分encoder
  2. decoder没有处理好的task。

5. 作者的主要贡献

作者观点:

  1. 发现encoder比decoder重要。
  2. encoder更难训练。

作者推荐的模型:a jointly masked seq2seq model
将masked language model用于non-autoregressive neural machine translation (NAT)。

  1. encoder:对source随机mask一些token
  2. decoder:对target mask一些连续的片段。
  3. 损失函数:n-gram,缓和repeat问题
  4. 推断:每次迭代mask一部分输入然后做predict

6. 得到了什么结果

  1. 性能比NAT好
    WMT14 English-German BLEU 27.69
    WMT14 German-English BLEU 32.24
  2. 速度比AT快
    比autoregressive model快5倍

7. 关键字

Encoder, decoder

results matching ""

    No results matching ""