一般化
强化学习
计算机科学
人工智能
泛化误差
钢筋
内在动机
机器学习
人工神经网络
数学
心理学
社会心理学
数学分析
摘要
In most real-world open environments, deploying reinforcement learning algorithms poses significant challenges. Since models trained through reinforcement learning are sensitive to environmental parameters, strategies trained in one environment may not perform well in unseen environments. Existing methods primarily focus on increasing the similarity between training and testing data or directly dealing with feature differences between training and testing, however, these methods suffer from issues such as lack of diversity and prone to overfitting. In this work, we propose a novel method called Bidirectional Prediction with Randomization (BPR). Based on randomization of the lower-level inputs of the network, BPR utilizes a discriminator for environment forward and inverse dynamics prediction, aiming to enhance the generalization performance of deep reinforcement learning algorithm models by learning the intrinsic characteristics of the environment. We conduct experiments on all environments of OpenAI Procgen benchmark, and the results demonstrate that our method performs well in game scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI