1. Paper title

A Study of Non-autoregressive Model for Sequence Generation

2. link

https://www.aclweb.org/anthology/2020.acl-main.15.pdf

3. 摘要

Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel, resulting in faster generation speed compared to their autoregressive (AR) counterparts but at the cost of lower accuracy. Different techniques including knowledge distillation and source-target alignment have been proposed to bridge the gap between AR and NAR models in various tasks such as neural machine translation (NMT), automatic speech recognition (ASR), and text to speech (TTS). With the help of those techniques, NAR models can catch up with the accuracy of AR models in some tasks but not in some others. In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all? (2) Why techniques like knowledge distillation and source-target alignment can help NAR models. Since the main difference between AR and NAR models is that NAR models do not use dependency among target tokens while AR models do, intuitively the difficulty of NAR sequence generation heavily depends on the strongness of dependency among target tokens. To quantify such dependency, we propose an analysis model called CoMMA to characterize the difficulty of different NAR sequence generation tasks. We have several interesting findings: 1) Among the NMT, ASR and TTS tasks, ASR has the most target-token dependency while TTS has the least. 2) Knowledge distillation reduces the target-token dependency in target sequence and thus improves the accuracy of NAR models. 3) Source-target alignment constraint encourages dependency of a target token on source tokens and thus eases the training of NAR models.

4. 要解决什么问题

NAR相对于AR，生成速度更快而精度降低。在某些任务上，通过一些技术能够缩小NAR和AR之间的精度GAP。

5. 作者的主要贡献

研究关于：
（1）为什么NAR在某些任务上能达到AR的水平
（2）为什么某些技术能够提升NAR的性能？
结论是与token之间的Dependency有关。

CoMMA：分析与衡量token之间的Dependency。

6. 得到了什么结果

（1）token dependency：ASR > NMT > TTS
（2）Knowledge distillation减少token dependency，提升NAR性能
（3）Source-target alignment constraint鼓励token dependency，使NAR容易训练

7. 关键字

关于NAR的理论研究

2020.acl-main.11