proactive mdp-based collision avoidance algorithm for autonomous

P ROACTIVE MDP- BASED C OLLISION AVOIDANCE A LGORITHM
FOR A UTONOMOUS C AR
D. O SIPYCHEV, D. T RAN , W. S HENG AND G. C HOWDHARY
O KLAHOMA S TATE U NIVERSITY
(denis.osipychev, duyt, weihua.sheng, girish.chowdhary)@okstate.edu
O BJECTIVES
BJECTIVES
Autonomous driving in the presence of other road users
with the following key aspects:
• Drive safe
• Avoid collisions
• Interact with other cars
• Use the intention of others
• Can assume full knowledge
via V2V communication
• Make smart decisions
The objective is to
develop a proactive
collision avoidance
system in addition
to existing reactive
safety features and
prove its efficiency
through
simulations of interaction
with modeled and
real-human driving
cars.
B ACKGROUND
ACKGROUND
Classic path planning problem in time and XY domains
has existing solutions [1]:
• Completely reactive methods [2]
• Roadmap methods [3]
• Sampling search methods [4]
• Sequential methods [5]
However, these methods have such disadvantages as
false alarms, harsh driving annoying the end user or
inability to use probabilistic prediction. This work is
aimed to:
• Propose MDP-based algorithm to reduce number
of false alarms
• Optimize the time required to pass an intersection
• Take into account transition uncertainty and probability of intention of other drivers
M ETHODOLOGY
ETHODOLOGY
Action space is represented by actions:
NA
Action
$
1
Keep going
0
2
Slow acceleration
0
3
Slow brake
0
4
Slow Turn left
0
5
Slow Turn right
0
6
Emerg. stop
-100
7
Acceleration
-20
8
Brake
-20
9
Turn left
-30
10
Turn right
-30
Start
Get location and
velocity from all cars
Predict the motion of others
based on human intention model
Find probability for each state
being occupied by others
Program reward function R(s, a, s0 )
using probabilities and action costs
Figure 2: State space representation
Policy
exists
for R?
yes
no
Solve value-iteration for {R}
Execute the action in π(s)
MDP tuple (S, A, T, R) → Solution (V, P ).
Each state s is represented by set:
{time, Locx , Locy , velocity}
Transition probability model shown in
Fig. 3 was learned through explicit simulations of car dynamic function.
Figure 1: Col. Avoid. Algorithm
Reward R defined by probability of collision and cost of the action.
R(s, s0collision , a) = −1000
100
0
R(s, sfinal , a) = 0
s (time)
R(s, s0 , a) = Cost(a)
Value of each state is given by Bellman
equation:
Figure 3: Vel. dependent transition probability
P
0
0
0
V (s) = maxa∈A
s0 ∈S T (s, a, s )(R(s, a, s ) + γV (s ))
R ESULTS
ESULTS
C ONCLUSION
ONCLUSION
+ Allows the use of probabilistic human intention
model
+ Solves the problem in an optimization framework
+ Highly generalizable solution concept
- Can be computationally intensive
- Need to recompute the solution for every reward
scenario
O NGOING
NGOING R ESEARCH
ESEARCH
• Long-term human intention prediction
– Interact with human drivers and use their intentior
– Reduce additional policy computations
– Allows to switch policy less often
R EFERENCES
EFERENCES
[1] S. M. LaValle, Planning algorithms. Cambridge university press, 2006.
[2] F. Belkhouche, “Reactive path planning in a dynamic environment,”
Robotics, IEEE Transactions on, vol. 25, no. 4, pp. 902–911, 2009.
[3] N. M. Amato and Y. Wu, “A randomized roadmap method for path and
manipulation planning,” in Robotics and Automation, 1996. Proceedings., 1996
IEEE International Conference on, vol. 1. IEEE, 1996, pp. 113–120.
[4] N. K. Yilmaz, C. Evangelinos, P. F. Lermusiaux, and N. M. Patrikalakis, “Path
planning of autonomous underwater vehicles for adaptive sampling using
mixed integer linear programming,” Oceanic Engineering, IEEE Journal of,
vol. 33, no. 4, pp. 522–537, 2008.
[5] S. Brechtel, T. Gindele, and R. Dillmann, “Probabilistic mdp-behavior planning for cars,” in 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), 2011, pp. 1537–1542.
[6] R. Bellman, “A markovian decision process,” DTIC Document, Tech. Rep.,
1957.
• Computationally efficient problem decomposition
– Dynamical resolution change
– Decompose car allocations
– Separately solve MDP for each car
Figure 4: Agent(’Car1’) and human(’Car2’) velocities in random example, simulation stops when Agent pass intersection.
Figure 5: Max acceleration used and travel time comparison
for MDP and reactive methods. The higher variances of MDP
results are due to variety of solutions.