推荐系统
计算机科学
电子商务
对话系统
搜索引擎
万维网
情报检索
人机交互
对话框
作者
Guangtao Nie,Rong Zhi,Xiaofan Yan,Yufan Du,Xiangyang Zhang,Jianwei Chen,Mi Zhou,Hongshen Chen,Tianhao Li,Ziguang Cheng,Sulong Xu,Jinghe Hu
标识
DOI:10.1145/3640457.3688061
摘要
Multi-agent collaboration is the latest trending method to build conversational recommender systems (CRS), especially with the widespread use of Large Language Models (LLMs) recently. Typically, these systems employ several LLM agents, each serving distinct roles to meet user needs. In an industrial setting, it's essential for a CRS to exhibit low first token latency (i.e., the time taken from a user's input until the system outputs its first response token.) and high scalability—for instance, minimizing the number of LLM inferences per user request—to enhance user experience and boost platform profit. For example, JD.com's baseline CRS features two LLM agents and a search API but suffers from high first token latency and requires two LLM inferences per request (LIPR), hindering its performance. To address these issues, we introduce a Hybrid Multi-Agent Collaborative Recommender System (Hybrid-MACRS). It includes a central agent powered by a fine-tuned proprietary LLM and a search agent combining a related search module with a search engine. This hybrid system notably reduces first token latency by about 70% and cuts the LIPR from 2 to 1. We conducted thorough online A/B testing to confirm this approach's efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI