Fast maximum a posteriori inference in Monte Carlo state spaces Mike Klaas Dustin Lang Nando de Freitas {klaas,dalang,nando}@cs.ubc.ca Machine Learning Group University of British Columbia Na¨ıve computation • Still most commonly-used technique Distance transform (Huttenlocher et al., 2003) • Computes weighted max kernel on a d d-dimensional regular grid in O(dN ) time, or on a 1D MC grid in O(N log N ) time. 2 • Kernel is limited to exp −kx − yk (Gaussian) or exp {−kx − yk} (L1) MAP particle smoothing for a non-linear multi-modal time series problem. 3 2 1 10 1 10 0 Applications MAP belief propagation • Discrete message computation • N = M = size of state space MAP particle smoothing M AP • We wish to compute p x |y in a Monte Carlo (MC) state space. • N = M = number of particles used in the filtering step −2 −1 10 10 3 −3 4 10 10 10 2 3 10 Particles Dual-tree method 4 10 10 Particles Same complexity but higher constant factors Dual-tree is faster after 10ms (O(N log N )) Results: other factors Synthetic test with fixed N : dimensionality, clustering (uniform; generated from k Gaussians), 2 and spatial index (kd-trees; metric trees). Dual-tree methods can exceed O(N ) cost! naive anchors kd−tree 2 10 8 10 7 10 naive anchors kd−tree 1 10 Dimensions (k = 20) 1 10 40 1 uniform −1 40 1 1 0 10 anchors kd−tree 1 10 10 6 anchors kd−tree 10 1 10 Dimensions (k = 100) 40 40 k=4 anchors kd−tree anchors kd−tree 1 1 10 1 10 k=20 40 1 10 40 k=100 Discussion Why Monte Carlo grids? • Do not suffer from exponential blowup in high dims. • Focus on “interesting” parts of the state space −1 10 10 10 Assume: K(·) parameterized by a distance function: K(x, y) = K (δ(x, y)). Main idea: build a spatial access tree on the source and target points, and use this to bound the value of the maximum influence and prune candidate nodes which cannot contain a particle that exceeds this bound. The bound tightens as we expand the tree, allowing more nodes to be pruned and leaving but a few to check at the end. Point-node comparison example (left): 0 10 kd−tree = 1 Costs O(M N ): prohibitive for large N, M naive dual−tree Relative time (s) = max wiK(xi, yj ) i 2 10 10 Distance computations M AX fj • Beat-tracking using particle smoothing • 3D MC state space (Lang and de Freitas, 2004). naive dual−tree dist. transform 10 2 Results: Beat Tracking Time (s) To find: • For each target Y particle, the X particle of maximum influence: Results: 1D time series Time (s) Given: • A set of source points {xi}, i = 1 . . . N • A set of source weights {wi} • A set of target points {yj }, j = 1 . . . M • An affinity kernel K(·) Current approaches Time (s) The Problem Further gains can be had by considering node-node pairs. This is known as Dual-tree recursion (Gray and Moore, 2000) (right). • Dual-tree methods provide substantial • The algorithm has more overhead than performance gains in a wide variety of the distance transform, so the latter MAP inference settings. should be used when possible. • Applies to a wider variety of kernels • Depends heavily on structure in the than the distance transform, and handles data, kernel, and spatial index; further Monte Carlo grids in high dimensions. investigation into these factors is needed.