1. Paper title

Norm-Based Curriculum Learning for Neural Machine Translation

2. link

https://www.aclweb.org/anthology/2020.acl-main.41.pdf

3. 摘要

A neural machine translation (NMT) system is expensive to train, especially with highresource settings. As the NMT architectures become deeper and wider, this issue gets worse and worse. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. We use the norm (aka length or module) of a word embedding as a measure of 1) the difficulty of the sentence, 2) the competence of the model, and 3) the weight of the sentence. The normbased sentence difficulty takes the advantages of both linguistically motivated and modelbased sentence difficulties. It is easy to determine and contains learning-dependent features. The norm-based model competence makes NMT learn the curriculum in a fully automated way, while the norm-based sentence weight further enhances the learning of the vector representation of the NMT. Experimental results for the WMT’14 English– German and WMT’17 Chinese–English translation tasks demonstrate that the proposed method outperforms strong baselines in terms of BLEU score (+1.17/+1.56) and training speedup (2.22x/3.33x).

4. 要解决什么问题

NMT的特点是:1. 需要大量的训练数据 2. 跨语言的工作设定,因此需要更多的训练时间。
当网络变深变宽时,训练NMT非常expensive。
Transformer是最常用的NMT架构,改进的transformer有:

  1. 30层以上的深度模型
  2. 巨大batch size的scaling NMT
  3. Curriculum Learning让NMT训练得更快。

Curriculum Learning是指训练时让样本从简单到难,因此对样本难度的评价很关键。已有评价方法:

  1. 语言驱动,容易计算
  2. 基于模型,有用但难计算

5. 作者的主要贡献

引入norm-based curriculum learning method来提升训练的效果。

  1. 提供难度评价算法:norm-based
    原理:高频词和内容不敏感的罕见词的范数值较小。因此norm-based能同时表达语言驱动和基于模型这两种方法的特征。

  2. norm计算出来之后用于衡量以下内容:
    (1) 句子的难度 (2)模型的能力 (3)句子的权重
    1/2/3是基于word embedding计算的,3又反过来影响embedding

本文方法的特点:

  1. 自动安排课程, practical NMT system
  2. 把句子的难度结合到课程安排中

6. 得到了什么结果

  1. 性能更好
    WMT 14 English– German BLEU +1.17 speedup 2.22x
    WMT17 Chinese–English BLEU + 1.56 speedup 3.33x
  2. 训练更快
  3. 不需要引入额外的参数

7. 关键字

Encoder,curriculum , efficiency, deep

results matching ""

    No results matching ""