PlanScope: Learning to Plan Within Decision Scope Does Matter

Abstract

In the context of autonomous driving, learning-based methods have been promising for the development of planning modules. During the training process of planning modules, directly minimizing the discrepancy between expert-driving logs and planning output is widely deployed. In general, driving logs consist of suddenly appearing obstacles or swiftly changing traffic signals, which typically necessitate swift and nuanced adjustments in driving maneuvers. Concurrently, future trajectories of the vehicles exhibit their long-term decisions, such as adhering to a reference lane or circumventing stationary obstacles. Due to the unpredictable influence of future events in driving logs, reasoning bias could be naturally introduced to learning based planning modules, which leads to a possible degradation of driving performance. To address this issue, we identify the decisions and their corresponding time horizons, and characterize a so-called decision scope by retaining decisions within derivable horizons only, to mitigate the effect of irrational behaviors caused by unpredictable events. Several viable implementations have been proposed, among which batch normalization along the temporal dimension is particularly effective and achieves superior performance. It consistently outperforms baseline methods in terms of driving scores, as demonstrated through closed-loop evaluations on the nuPlan dataset. Essentially, this approach accommodates an appealing plug-and-play feature to enhance the closed-loop performance of other learning-based planning models.

Method Overview

Problem Identification

Model Framework

Candidate Approaches

Qualitative Comparison

Comparison between PLUTO-m12-C (left), PlanScope-Th20-m12-C (mid) and our method PlanScope-timenorm-m12-C (right).

PLUTO fails to make a timely response to the impending collision in the short-term, but chooses to continue along the long-term decision, Th20 fails to make predictive decision, while PlanScope makes the timely yielding maneuver.

PLUTO adopts a conservative decision due to the high uncertainty of the future trajectory caused by interaction with the rear vehicle in a long time horizon, while PlanScope successfully adopts human-like lane change decision.

In the above scenario, planning considering PlanScope can enhance the precision of vehicle control and reduce the deviation from the reference route.

Similar to the previous scenario, considering PlanScope enables more precise adjustments of driving status and timely return to the reference lane. However, for PLUTO, its pursuit of long-term targets even leads to deviation between the starting point of the planning trajectory and the current position of the self-driving vehicle.

Quantitative Comparison

Experiments conducted on 20% of nuPlan dataset and evaluated on Random14 test dataset. From where we can observe: Too long and too short horizon both result in worse performance. Temporal loss weighting works. Time dependent normalization is compatible with contrastive learning. MDD+DWH and IDD+DWT are also able to surpass baseline.