CIMRL: Combining Imitation & Reinforcement Learning for Safe Autonomous Driving

Jul 13, 2024

Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Aleksandr Petiushko

In the figure, the upper plan is too conservative, while the lower plan is too aggressive (and eventually leads to a future collision), and they both have low scores; the safe and non-conservative plan is chosen as having the highest score.

A schematic diagram of the Recovery RL method during inference.

Top: safe case, where a4 and a5 are unsafe actions, so we re-normalize for a1, a2, and a3 and sample from this truncated re-normalized task policy. Bottom: unsafe case, where every action violates at least one Qrisk, so we sample from the recovery policy.

Top: Comparison of CIMRL with MTR (two versions: using only top probability trajectory or just sample according to the probability distribution provided by MTR). Bottom: CIMRL is better than the BC-based approach for both Collision and Progress rates and becomes even better for the Progress rate when adding a new trajectory source — heuristic-based plans.

CIMRL learned to provide correct driving w/o stuck in the middle of a T-shaped intersection when there is no space to move further (longer video).

CIMRL learned to provide safe driving w/o being too conservative and doing gradual “creeping” motion before crossing the intersection where cross traffic doesn’t stop (longer video).

CIMRL: Combining Imitation & Reinforcement Learning for Safe Autonomous Driving

Related Posts.