1. Paper title

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

2. link

https://www.aclweb.org/anthology/2020.acl-main.165.pdf

3. 摘要

Many multi-domain neural machine translation (NMT) models achieve knowledge transfer by enforcing one encoder to learn shared embedding across domains. However, this design lacks adaptation to individual domains. To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing. We first observe that words in a sentence are often related to multiple domains. Hence, we assume each word has a domain proportion, which indicates its domain preference. Then word representations are obtained by mixing their embedding in individual domains based on their domain proportions. We show this can be achieved by carefully designing multi-head dot-product attention modules for different domains, and eventually taking weighted averages of their parameters by word-level layer-wise domain proportions. Through this, we can achieve effective domain knowledge sharing, and capture fine-grained domain-specific knowledge as well. Our experiments show that our proposed model outperforms existing ones in several NMT tasks.

4. 要解决什么问题

NMT需要大量标注的parallel sentences。但某些特定的领域没有足够多的parallel sentences。解决方法:

  1. 使用unparalleled数据集
  2. 训练一个多域的NMT,即把多个域的数据堆在一起,训练一个统一的模型。

多域NMT存在的问题:

  1. 数据足够的情况下,多域性能劣于单域
  2. 多域模型针对某个域fine-tune后,其它域性能变差。

多域NMT的改进:

  1. 实例加权,例如:
    (1)句子和domain分别加权
    (2)cost weighting
    (3)训练过程中动态选择和调整句子的权重。
  2. 特殊的encoder-decoder结构,例如:
    (1)domain-aware embedding + domain classifier
    (2)在embedding内划分domain-share和domain-specific知识
    (3)用评价机制使模型更关注domain-specific单词。

以上改进方法存在的问题:
让一个encoder学习所有域通用的embedding,缺少针对特定域的适配。

5. 作者的主要贡献

本文模型的特点:

  1. 一个多域的NMT模型,对每个域有一个对应的模块。
  2. 分析一个单词的域倾向,把域倾向结合到embedding中。
  3. word-level,自适用,layer-wise的deomain融合。

原理:
句子的domain不一定就是句子中单词的domain,因此希望 。。。 [?]没看懂

步骤:

  1. 根据内容为每个单词分配domain proportion
  2. multi-head点乘注意力机制,基于domain proportion捕获domain-shared/specific知识
  3. 把不同的modules混合

6. 得到了什么结果

  1. 更擅长捕获domain-specific知识
  2. 更擅长domain知识共享
  3. 同时提升了所有domain的性能。

7. 关键字

多域

results matching ""

    No results matching ""