1. Paper title

Evaluating Dialogue Generation Systems via Response Selection

2. link

https://www.aclweb.org/anthology/2020.acl-main.55.pdf

3. 摘要

Existing automatic evaluation metrics for open-domain dialogue response generation systems correlate poorly with human evaluation. We focus on evaluating response generation systems via response selection. To evaluate systems properly via response selection, we propose a method to construct response selection test sets with well-chosen false candidates. Specifically, we propose to construct test sets filtering out some types of false candidates: (i) those unrelated to the ground-truth response and (ii) those acceptable as appropriate responses. Through experiments, we demonstrate that evaluating systems via response selection with the test set developed by our method correlates more strongly with human evaluation, compared with widely used automatic evaluation metrics such as BLEU.

4. 要解决什么问题

开放领域对话应答生成的自动评估指标与人工评估指标差别很大,需要更合适自动评估指标。

5. 作者的主要贡献

提出一种使用精心选择的错误候选对象构建响应选择测试集的方法,即在构建测试集是过滤掉以下两种错误候选对象:
[?] 没看懂
(1)与真实响应无关的对象
(2)作为使用的响应的对象

6. 得到了什么结果

这种指标相比于BLEU,更接近人工评价指标。
[?] 怎么得出这个结论的?

7. 关键字

Evaluation, open-domain

results matching ""

    No results matching ""