1. Paper title
Enhancing Machine Translation with Dependency-Aware Self-Attention
2. link
https://www.aclweb.org/anthology/2020.acl-main.147.pdf
3. 摘要
Most neural machine translation models only rely on pairs of parallel sentences, assuming syntactic information is automatically learned by an attention mechanism. In this work, we investigate different approaches to incorporate syntactic knowledge in the Transformer model and also propose a novel, parameter-free, dependency-aware self-attention mechanism that improves its translation quality, especially for long sentences and in low-resource scenarios. We show the efficacy of each approach on WMT English↔German and English→Turkish, and WAT English→Japanese translation tasks.
4. 要解决什么问题
问题:NMT假设能通过注意力机制能学到语言学知识,但实际上NMT学不到语言学的深层结构,例如parallel sentences的句法信息。
一些让NMT学习句法信息的尝试:
- 把句法信息融入NMT
- 把dependency parser的中间representation加入到word embedding
- 把word representation和syntax representation交织
- 两种数据增强技术,分别用于low resource和large scale
5. 作者的主要贡献
本文方法的特点:
- 强化注意力机制
- 不增加flexity
本文方法的原理:
减少注意力分布的分散性,注意力机制的效果才会好。
本文方法:PASCAL,parent-scaled self-attention
6. 得到了什么结果
新方法对长句子和low-resource场景特别有效。
WMT English↔German
WMT English→Turkish
WAT English→Japanese
7. 关键字
长句子