强化学习
杠杆(统计)
计算机科学
嵌入
马尔可夫决策过程
人工智能
任务(项目管理)
机器学习
马尔可夫过程
数学
工程类
统计
系统工程
作者
Honguk Woo,Gwangpyo Yoo,Minjong Yoo
出处
期刊:Proceedings of the ... AAAI Conference on Artificial Intelligence
[Association for the Advancement of Artificial Intelligence (AAAI)]
日期:2022-06-28
卷期号:36 (8): 8657-8665
被引量:1
标识
DOI:10.1609/aaai.v36i8.20844
摘要
Reinforcement learning (RL) agents empowered by deep neural networks have been considered a feasible solution to automate control functions in a cyber-physical system. In this work, we consider an RL-based agent and address the issue of learning via continual interaction with a time-varying dynamic system modeled as a non-stationary Markov decision process (MDP). We view such a non-stationary MDP as a time series of conventional MDPs that can be parameterized by hidden variables. To infer the hidden parameters, we present a task decomposition method that exploits CycleGAN-based structure learning. This method enables the separation of time-variant tasks from a non-stationary MDP, establishing the task decomposition embedding specific to time-varying information. To mitigate the adverse effect due to inherent noises of task embedding, we also leverage continual learning on sequential tasks by adapting the orthogonal gradient descent scheme with a sliding window. Through various experiments, we demonstrate that our approach renders the RL agent adaptable to time-varying dynamic environment conditions, outperforming other methods including state-of-the-art non-stationary MDP algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI