计算机科学
推论
软件部署
边缘设备
移动边缘计算
能源消耗
移动设备
延迟(音频)
边缘计算
最优化问题
GSM演进的增强数据速率
人工智能
分布式计算
高效能源利用
一般化
构造(python库)
基站
趋同(经济学)
能量(信号处理)
移动计算
能量最小化
数学优化
服务质量
蜂窝网络
实时计算
机器学习
作者
Ruichen Zhang,Xiaofeng Luo,Jiayi He,Dusit Niyato,Jiawen Kang,Zehui Xiong,Yonghui Li
摘要
This paper investigates compact large language model (LLM) deployment and world-model-assisted inference offloading in mobile edge computing (MEC) networks. We first propose an edge compact LLM deployment (ECLD) framework that jointly applies structured pruning, low-bit quantization, and knowledge distillation to construct edge-deployable LLM variants, and we evaluate these models using four complementary metrics: accessibility, energy consumption, hallucination rate, and generalization accuracy. Building on the resulting compact models, we formulate an MEC offloading optimization problem that minimizes the long-term average inference latency subject to per-device energy budgets and LLM-specific quality-of-service constraints on effective accuracy and hallucination. To solve this problem under unknown and time-varying network dynamics, we develop a world model-proximal policy optimization (PPO) algorithm, which augments an on-policy PPO algorithm with a learned recurrent world model that provides improved value targets and short imagination rollouts. Extensive experiments on Llama-3.1-8B, Qwen3-8B, and Mistral-12B show that ECLD compresses base models by about 70-80% in storage (i.e., from 15.3 GB to 3.3 GB for Llama-3.1-8B) and reduces per-query energy consumption by up to 50%, while largely preserving accuracy and often lowering hallucination compared with quantization-only or pruning-only baselines. Moreover, they also show that world model-PPO speeds up convergence by about 50%, improves the final reward by 15.8% over vanilla PPO, and reduces average inference latency by 12-30% across different user populations, while satisfying the accuracy and hallucination constraints and approaching the generation quality of always-offloading with much of the efficiency of local execution.
科研通智能强力驱动
Strongly Powered by AbleSci AI