Reinforcement learning requires policies to generalize to both known (in-distribution, ID) and new (out-of-distribution, OOD) environments. With contextual RL, it is possible to handle ID conditions but it’s another story for new scenarios. Robust can address that but performance in ID settings suffer. GRAM is a deep reinforcement learning approach that adapts by quantifying uncertainty to switch between ID adaptation and OOD robustness.
GRAM combines teacher-student learning for ID adaptation and adversarial RL for new new scenarios. As the researchers explain, “GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios.”
[HT]