Reward maximization is a fundamental principle in both the survival and evolution of biological organisms. In particular, in the contexts of cognitive science and embodied agents, reward-driven behavior has been widely regarded as important in terms of governing complex cognitive abilities such as perception, imitation, and learning. Among the frameworks that are aimed at establishing such abilities, reinforcement learning (RL), which leverages reward maximization to facilitate intelligent decision-making in embodied agents, has been proven to be particularly promising. Importantly, the inherent complexity and uncertainty of real-world tasks pose significant challenges when designing effective reward functions for embodied RL agents. Conventional methods typically rely on manually engineered or externally tuned reward signals, and therefore require significant domain expertise, associated with considerable human efforts and a long convergence time; these issues may even trigger mission failure. This work introduces a bilevel optimization framework that discovers optimal reward functions for embodied reinforcement learning agents through a mechanism called regret minimization. The approach accelerates policy optimization and enhances adaptability across diverse tasks. These findings can support the broader adoption of embodied RL agents in the behavioral and computational sciences and neurosciences, thereby paving the way for artificial general intelligence.