Reinforcement learning requires policies to generalize to both known (in-distribution, ID) and new (out-of-distribution, OOD) environments. With contextual RL, it is possible to handle ID conditions but it’s another story for new scenarios. Robust can address that but performance in ID settings suffer. GRAM is a deep reinforcement learning approach that adapts by quantifying uncertainty to switch between ID adaptation and OOD robustness.
More like this ➡️ here
GRAM: Generalization in Deep RL with a Robust Adaptation Module
GRAM combines teacher-student learning for ID adaptation and adversarial RL for new new scenarios. As the researchers explain, “GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios.”
[HT]
*Our articles may contain aff links. As an Amazon Associate we earn from qualifying purchases. Please read our disclaimer on how we fund this site.