Occlusion Robust Pose and Model Estimation using Gaussian Process Latent Variable Model on GPU Christopher B. Choy Electrical Engineering, Stanford University Objectives • Given monocular image/video and collection of 3D models • Find : • Pose Estimation : 3D location, 3D rotation 1D scale • Model variation estimation : 3D CAD, Skeletal or Voxelized model • s.t. 3D model fits foreground segmentation Pipeline • Foreground background segmentation • Model variation using GPLVM • Energy Function and Optimization • Multiple Objects using non-intersection constraints Gaussian Process Latent Variable Model Probabilistic PCA → Dual PPCA → Non-linear Dual PPCA (GPLVM) = W x + η, Noise : η = N (η|0, β −1I) D 1 DN −1 T ln(2π) − ln|K| − tr(K Y Y ) (3) L=− 2 2 2 •y T −1 where K = W W + β I DN D 1 −1 T L=− ln(2π) − ln|K| − tr(K Y Y ) (4) 2 2 2 where K = XX T + β −1I • dual T 1 DY T PPCA X = U LV first q eigen.. of Y • PPCA W = U 0L0V 0T first q eigen.. of N1 Y Y • Arbitrary nonlinear Kernel: 2 ||xi−xj || κ(xi, xj ) = β1exp + β3 + β4δij 2β2 T • DCT Segmentation Compression 20 × 20 × 20 = 8k n×D •Y ∈ R where n is the number of model Object Detection using Hierarchical Segmentation • Edges from Garbor filters • Graph over the image using the edges Borrowed from A. Dame, et al. CVPR 2013 • n smallest eigenvectors of the graph Laplacian as new input images.   Energy Function and Newton Step mP b(p)  for ||p − p || < r Wij = exp − max i j p∈ij ρ X (1) E(Φ) = log[π(Φ(x))P (x) + (1 − π(Φ(x)))P (x)] f b X x∈Ω L = D − W where W = [Wij ]ij and Djj = Wij Where Pf foreground posterior probability i (2) X Pf (x) − Pb(x) ∂π(Φ) ∂E • n smallest eigenvalues of L as input for contour = ∂λ π(Φ)P (x) + (1 − π(Φ))P (x) ∂λ p f b p x∈Ω detection → sP b(p) Φ(X) X ∂π(Φ) e ∂Φ ∂X • linearly combine sP b(p) and mP b(p) to get = (1 − π(Φ)) Φ(X) + 1 ∂X ∂λ ∂λ e z p p global contour. • Oriented watershed : averages the edges on the ∂X • Analytic solution ∂λ exists p same line segment ∂Φ • using centered finite differences ∂X • Merges the regions with the smallest average • x pixel on image, Xcamera voxelize 3D space. weight of the edge in common • Xcamera = v MoXobject where v Mo is transformation matrix Results Multi-object Pose Estimation • Multi object : occlusion • Occlusion robust pose model estimation • Constraints: non-intersection X X o o o o min log[π(Φ (x))Pf (x) + (1 − π(Φ (x)))Pb (x)] λ s.t. o∈O x∈Ω X π(Φo) ≤ 1 o∈O • Optimization too high dimensional i.e. λ ∈ R|O|(7+dimL) • Alternative optimization while convergence for o ∈ O X o o o o min log[π(Φ (x))Pf (x) + (1 − π(Φ (x)))Pb (x)] λ x∈Ω s.t. π(Φo) ≤ X • Occluding π(Φo0) (7+dimL) ∈R , |O| times • Non-intersection→ background for other objects while convergence for o ∈ O Pbo(x) ← µπ(Φo0) Pfo(x) ← 1 − µπ(Φo0) X min λ log[π(Φ o o (x))Pf (x) object affect the pose of the occluded object • Estimate better pose through object relationship o0∈O\{o} •λ Conclusion + (1 − π(Φ o o (x)))Pb (x)] x∈Ω • Using only the non-occluded parts to estimate the pose .7 ≥ µ ≥ .5 Discussion • Evaluation metric • segmentation overlap : VOC 2012 segmentation • pose ground truth label and L2 distance as error metric • Automatic initialization • DPM-style rough pose estimation • Better segmentation for complicated scene • 3-D Results non-intersection constraint using B.B. • Dynamic segmentation using 3D model i.e. dynamic graph cut • Supporting plane → depth and scale References [1] A. Dame, V. A. Prisacariu, C. Y. Ren and I. D. Reid, "Dense Reconstruction Using 3D Object Shape Priors". CVPR, 2013 [2] Felzenszwalb, P. F. and Girshick, R. B. and McAllester, D. and Ramanan, D. "Object Detection with Discriminatively Trained Part Based Models", PAMI 2010