Research
8 min read

CIMRL: Combining Imitation & Reinforcement Learning for Safe Autonomous Driving

Jul 13, 2024
·
Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Aleksandr Petiushko

In the figure, the upper plan is too conservative, while the lower plan is too aggressive (and eventually leads to a future collision), and they both have low scores; the safe and non-conservative plan is chosen as having the highest score.

A schematic diagram of the Recovery RL method during inference.

Top: safe case, where a4 and a5 are unsafe actions, so we re-normalize for a1, a2, and a3 and sample from this truncated re-normalized task policy. Bottom: unsafe case, where every action violates at least one Qrisk, so we sample from the recovery policy.

Top: Comparison of CIMRL with MTR (two versions: using only top probability trajectory or just sample according to the probability distribution provided by MTR). Bottom: CIMRL is better than the BC-based approach for both Collision and Progress rates and becomes even better for the Progress rate when adding a new trajectory source — heuristic-based plans.

CIMRL learned to provide correct driving w/o stuck in the middle of a T-shaped intersection when there is no space to move further (longer video).

CIMRL learned to provide safe driving w/o being too conservative and doing gradual “creeping” motion before crossing the intersection where cross traffic doesn’t stop (longer video).