1. Paper title
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
2. link
https://www.aclweb.org/anthology/2020.acl-main.64.pdf
3. 摘要
The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, systemlevel: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.
4. 要解决什么问题
开放领域对话缺有效的自动估计指标。
5. 作者的主要贡献
USR: 一种无监督、无参与的对话评估指标
6. 得到了什么结果
USR的评估结果与人工评估结果有很强的相关性。
TopicalChat turn-level 0.42 systemlevel 1.0
PersonaChat turn-level 0.48 systemlevel 1.0
USR提供了可解释性度量。
7. 关键字
Evaluation, open-domain