分解
多样性(政治)
强化学习
钢筋
双层优化
政治学
心理学
计算机科学
人工智能
数学优化
社会心理学
数学
生物
生态学
法学
最优化问题
作者
Tianyu Ren,Hui Wang,Karen Rafferty
出处
期刊:Proceedings of the ... AAAI Conference on Artificial Intelligence
[Association for the Advancement of Artificial Intelligence (AAAI)]
日期:2025-04-11
卷期号:39 (23): 25083-25091
标识
DOI:10.1609/aaai.v39i23.34693
摘要
Recent advancements in question generation (QG) have been significantly propelled by reinforcement learning (RL). Although extensive reward models have been designed to capture the attributes of ideal questions, their associated learning challenges, particularly in sample efficiency and diversity, remain underexplored. This paper introduces a bilevel policy decomposition (BPD) framework and a diversity-seeking RL (DSRL) objective to address these issues. The BPD framework utilizes two cascading policies to divide QG into two more manageable sub-tasks: answer-centric summary generation and summary-augmented QG, facilitating exploration and accelerating policy learning. Concurrently, the DSRL objective preserves the inherent diversity of QG by ensuring the bilevel policies align probabilistically with their reward models rather than merely maximizing returns. Our integrated approach, named BPD-DSRL, demonstrates superior performance over existing baselines on multiple question quality and diversity metrics across various QG benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI