1. Paper title
Character-Level Translation with Self-attention
2. link
https://www.aclweb.org/anthology/2020.acl-main.145.pdf
3. 摘要
We explore the suitability of self-attention models for character-level neural machine translation. We test the standard transformer model, as well as a novel variant in which the encoder block combines information from nearby characters using convolutions. We perform extensive experiments on WMT and UN datasets, testing both bilingual and multilingual translation to English using up to three input languages (French, Spanish, and Chinese). Our transformer variant consistently outperforms the standard transformer at the character-level and converges faster while learning more robust character-level alignments.1
4. 要解决什么问题
大多数NMT是word-level的。
character-level NMT的好处有:  
- representation更compact
- 能处理OOV
- 如果多种语言使用相同的字符表,则可以共用同一套模型。
将self-attention模型用于字符级NMT中。
5. 作者的主要贡献
本文尝试两种模型:
- 标准transformer
- 卷积 + transformer
本文尝试以下实验:多语->英语
- FR, ES -> EN
- FR, ZH -> EN
在encoder block中结合由CNN中附近字符中提取出的特征。
6. 得到了什么结果
- 注意力 + 字符级 + 卷积的模型 VS 单词级的模型 
 性能相同,前者参数更少
- 卷积 + transformer VS 标准transformer 
 前者性能更好、收敛更快, 鲁棒性更好