Fast maximum a posteriori inference in Monte Carlo state spaces

Fast maximum a posteriori inference in Monte Carlo state spaces
Mike Klaas
Dustin Lang
Nando de Freitas
{klaas,dalang,nando}@cs.ubc.ca
Machine Learning Group
University of British Columbia
Na¨ıve computation
• Still most commonly-used technique
Distance transform (Huttenlocher et al., 2003)
• Computes weighted max kernel on a
d
d-dimensional regular grid in O(dN ) time,
or on a 1D MC grid in O(N log N ) time.
2
• Kernel is limited to exp −kx − yk
(Gaussian) or exp {−kx − yk} (L1)
MAP particle smoothing for a non-linear
multi-modal time series problem.
3
2
1
10
1
10
0
Applications
MAP belief propagation
• Discrete message computation
• N = M = size of state space
MAP particle smoothing
M
AP
• We wish to compute p x
|y in a
Monte Carlo (MC) state space.
• N = M = number of particles used in the
filtering step
−2
−1
10
10
3
−3
4
10
10
10
2
3
10
Particles
Dual-tree method
4
10
10
Particles
Same complexity but higher constant factors
Dual-tree is faster after 10ms (O(N log N ))
Results: other factors
Synthetic test with fixed N : dimensionality, clustering (uniform; generated from k Gaussians),
2
and spatial index (kd-trees; metric trees). Dual-tree methods can exceed O(N ) cost!
naive
anchors
kd−tree
2
10
8
10
7
10
naive
anchors
kd−tree
1
10
Dimensions (k = 20)
1
10
40
1
uniform
−1
40
1
1
0
10
anchors
kd−tree
1
10
10
6
anchors
kd−tree
10
1
10
Dimensions (k = 100)
40
40
k=4
anchors
kd−tree
anchors
kd−tree
1
1
10
1
10
k=20
40
1
10
40
k=100
Discussion
Why Monte
Carlo grids?
• Do not suffer from exponential blowup in high dims.
• Focus on “interesting” parts
of the state space
−1
10
10
10
Assume: K(·) parameterized by a distance
function: K(x, y) = K (δ(x, y)).
Main idea: build a spatial access tree on
the source and target points, and use this to
bound the value of the maximum influence
and prune candidate nodes which cannot
contain a particle that exceeds this bound.
The bound tightens as we expand the tree,
allowing more nodes to be pruned and leaving but a few to check at the end.
Point-node comparison example (left):
0
10
kd−tree = 1
Costs O(M N ): prohibitive for large N, M
naive
dual−tree
Relative time (s)
= max wiK(xi, yj )
i
2
10
10
Distance computations
M
AX
fj
• Beat-tracking using particle smoothing
• 3D MC state space (Lang and de Freitas, 2004).
naive
dual−tree
dist. transform
10
2
Results: Beat Tracking
Time (s)
To find:
• For each target Y particle, the X particle
of maximum influence:
Results: 1D time series
Time (s)
Given:
• A set of source points {xi}, i = 1 . . . N
• A set of source weights {wi}
• A set of target points {yj }, j = 1 . . . M
• An affinity kernel K(·)
Current approaches
Time (s)
The Problem
Further gains can be had by considering
node-node pairs. This is known as Dual-tree
recursion (Gray and Moore, 2000) (right).
• Dual-tree methods provide substantial • The algorithm has more overhead than
performance gains in a wide variety of the distance transform, so the latter
MAP inference settings.
should be used when possible.
• Applies to a wider variety of kernels • Depends heavily on structure in the
than the distance transform, and handles data, kernel, and spatial index; further
Monte Carlo grids in high dimensions.
investigation into these factors is needed.