Using RL's exploitation to debug

Oct 26, 2022

A musing on how I think autonomous system companies should use RL.

2 Comments

Oct 27, 2022Liked by Nathan Lambert

Thanks for the great post, Nathan! Using RL to find the failure cases in robotics is definitely an interesting perspective.

This is one of my work in generating adversarial environment for robotic applications: https://arxiv.org/abs/2107.06353. One interesting aspect of it is that it does not assume a parameterization of the environment (many related work in RL settings such as maze navigation consider parameterized environments). Instead we find the adversarial environments from a generative model trained on an existing dataset. This allows us, for example, to find adversarial grasping object with arbitrary shapes. I can see future work in treating the generative model as a RL agent for generating adversarial environments.

I think one aspect worth considering, which you touched upon briefly also, is whether adversarial cases are truly useful for training better robotic policies. I think often a better coverage of the possible cases (i.e., covering rare cases) instead of just finding the harder cases, could be more useful, in settings like autonomous driving. Maybe there is some way to achieve both coverage in rare and hard cases (e.g., combining domain randomization and adversarial training).

Expand full comment

Reply (1)