1. Paper title

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

2. link

https://www.aclweb.org/anthology/2020.acl-main.64.pdf

3. 摘要

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, systemlevel: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

4. 要解决什么问题

开放领域对话缺有效的自动估计指标。

5. 作者的主要贡献

USR: 一种无监督、无参与的对话评估指标

6. 得到了什么结果

USR的评估结果与人工评估结果有很强的相关性。
TopicalChat turn-level 0.42 systemlevel 1.0
PersonaChat turn-level 0.48 systemlevel 1.0
USR提供了可解释性度量。

7. 关键字

Evaluation, open-domain

results matching ""

    No results matching ""