Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems

计算机科学 一致性(知识库) 任务(项目管理) 集合(抽象数据类型) 人机交互 质量(理念) 用户建模 模拟 用户界面 人工智能 程序设计语言 系统工程 认识论 工程类 哲学
作者
Weiwei Sun,Shuyu Guo,Shuo Zhang,Pengjie Ren,Zhumin Chen,Maarten de Rijke,Zhaochun Ren
标识
DOI:10.1145/3596510
摘要

Task-oriented dialogue systems (TDSs) are assessed mainly in an offline setting or through human evaluation. The evaluation is often limited to single-turn or is very time-intensive. As an alternative, user simulators that mimic user behavior allow us to consider a broad set of user goals to generate human-like conversations for simulated evaluation. Employing existing user simulators to evaluate TDSs is challenging as user simulators are primarily designed to optimize dialogue policies for TDSs and have limited evaluation capabilities. Moreover, the evaluation of user simulators is an open challenge. In this work, we propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates a user’s analogical thinking in interactions with systems. We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities. Our user simulator constructs a metaphorical user model that assists the simulator in reasoning by referring to prior knowledge when encountering new items. We estimate the quality of simulators by checking the simulated interactions between simulators and variants. Our experiments are conducted using three TDS datasets. The proposed user simulator demonstrates better consistency with manual evaluation than an agenda-based simulator and a seq2seq model on three datasets; our tester framework demonstrates efficiency and has been tested on multiple tasks, such as conversational recommendation and e-commerce dialogues.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
思源应助大力糜采纳,获得10
刚刚
hhhh完成签到,获得积分10
1秒前
1秒前
刘玲发布了新的文献求助10
2秒前
酷波er应助天边一阵风采纳,获得30
2秒前
tianmengkui完成签到,获得积分10
3秒前
落月铭发布了新的文献求助10
3秒前
Dragon完成签到,获得积分10
3秒前
3秒前
whs完成签到,获得积分20
4秒前
4秒前
mic发布了新的文献求助10
5秒前
5秒前
科研通AI6应助wulala采纳,获得10
5秒前
6秒前
Ephemeral发布了新的文献求助10
6秒前
归尘发布了新的文献求助10
6秒前
xiamu完成签到 ,获得积分10
6秒前
7秒前
typpppp发布了新的文献求助10
8秒前
xiu-er发布了新的文献求助10
8秒前
勤劳糜完成签到 ,获得积分10
8秒前
rocket完成签到,获得积分10
8秒前
脑洞疼应助溺爱王楚钦采纳,获得10
9秒前
10秒前
机灵开山发布了新的文献求助10
11秒前
凶凶发布了新的文献求助10
11秒前
11秒前
12秒前
淳渟发布了新的文献求助10
13秒前
14秒前
FashionBoy应助MslL她的影子采纳,获得50
15秒前
顾矜应助笑点低的小天鹅采纳,获得10
15秒前
15秒前
princecoof完成签到,获得积分10
16秒前
16秒前
墨懿发布了新的文献求助10
16秒前
16秒前
16秒前
打打应助zjx0925采纳,获得10
17秒前
高分求助中
Encyclopedia of Quaternary Science Third edition 2025 12000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
The Social Work Ethics Casebook: Cases and Commentary (revised 2nd ed.). Frederic G. Reamer 800
Beyond the sentence : discourse and sentential form / edited by Jessica R. Wirth 600
Holistic Discourse Analysis 600
Vertébrés continentaux du Crétacé supérieur de Provence (Sud-Est de la France) 600
Reliability Monitoring Program 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5342127
求助须知:如何正确求助?哪些是违规求助? 4478048
关于积分的说明 13938042
捐赠科研通 4374445
什么是DOI,文献DOI怎么找? 2403529
邀请新用户注册赠送积分活动 1396244
关于科研通互助平台的介绍 1368307