Leveraging Causal Policy-Reward Entropy for Enhanced Exploration

Published in 12th International Conference on Learning Representations (ICLR 2024)., 2024

Recommended citation: Ji, T., Zeng, Y., Luo, Y., Zhan, X. Leveraging Causal Policy-Reward Entropy for Enhanced Exploration. In the 12th International Conference on Learning Representations (ICLR 2024) (Tiny Paper).

Abstract

The impacts of taking different actions in reinforcement learning (RL) tasks often dynamically vary during the policy learning process. We exploit the causal relationship between actions and potential reward gains, proposing a causal policy-reward entropy term. This term could effectively identify and prioritize actions with high potential impacts, thus enhancing exploration efficiency. Moreover, it could be seamlessly incorporated into any Max-Entropy RL framework. Our instantiation, termed Causal Actor-Critic (CAC), showcases superior performance across a range of continuous control tasks and provides insightful explanations for the actions.

Other information