Occlusion Robust Pose and Model Estimation using Gaussian

Occlusion Robust Pose and Model Estimation using Gaussian Process Latent
Variable Model on GPU
Christopher B. Choy
Electrical Engineering, Stanford University
Objectives
• Given
monocular image/video and
collection of 3D models
• Find :
• Pose Estimation : 3D location, 3D rotation 1D
scale
• Model
variation estimation : 3D CAD,
Skeletal or Voxelized model
• s.t.
3D model fits foreground segmentation
Pipeline
• Foreground
background segmentation
• Model variation using GPLVM
• Energy Function and Optimization
• Multiple Objects using non-intersection
constraints
Gaussian Process Latent Variable
Model
Probabilistic PCA → Dual PPCA → Non-linear
Dual PPCA (GPLVM)
= W x + η, Noise : η = N (η|0, β −1I)
D
1
DN
−1 T
ln(2π) − ln|K| − tr(K Y Y ) (3)
L=−
2
2
2
•y
T
−1
where K = W W + β I
DN
D
1
−1
T
L=−
ln(2π) − ln|K| − tr(K Y Y ) (4)
2
2
2
where K = XX T + β −1I
• dual
T
1
DY
T
PPCA X = U LV first q eigen.. of
Y
• PPCA W = U 0L0V 0T first q eigen.. of N1 Y Y
• Arbitrary nonlinear
Kernel:
2
||xi−xj ||
κ(xi, xj ) = β1exp
+ β3 + β4δij
2β2
T
• DCT
Segmentation
Compression 20 × 20 × 20 = 8k
n×D
•Y ∈ R
where n is the number of model
Object Detection using Hierarchical Segmentation
• Edges
from Garbor filters
• Graph over the image using the edges
Borrowed from A. Dame, et al. CVPR 2013
• n smallest eigenvectors of the graph Laplacian as
new input images.


Energy
Function
and
Newton
Step
mP
b(p)
 for ||p − p || < r
Wij = exp − max
i
j
p∈ij
ρ
X
(1)
E(Φ)
=
log[π(Φ(x))P
(x)
+
(1
−
π(Φ(x)))P
(x)]
f
b
X
x∈Ω
L = D − W where W = [Wij ]ij and Djj = Wij
Where Pf foreground posterior probability
i
(2)
X
Pf (x) − Pb(x)
∂π(Φ)
∂E
• n smallest eigenvalues of L as input for contour
=
∂λ
π(Φ)P
(x)
+
(1
−
π(Φ))P
(x)
∂λ
p
f
b
p
x∈Ω
detection → sP b(p)
Φ(X)
X
∂π(Φ)
e
∂Φ ∂X
• linearly combine sP b(p) and mP b(p) to get
= (1 − π(Φ))
Φ(X) + 1 ∂X ∂λ
∂λ
e
z
p
p
global contour.
• Oriented watershed : averages the edges on the
∂X
• Analytic solution ∂λ
exists
p
same line segment
∂Φ
•
using
centered
finite
differences
∂X
• Merges the regions with the smallest average
• x pixel on image, Xcamera voxelize 3D space.
weight of the edge in common
• Xcamera = v MoXobject where v Mo is
transformation matrix
Results
Multi-object Pose Estimation
• Multi
object : occlusion
• Occlusion robust pose model estimation
• Constraints: non-intersection
X X
o
o
o
o
min
log[π(Φ (x))Pf (x) + (1 − π(Φ (x)))Pb (x)]
λ
s.t.
o∈O x∈Ω
X
π(Φo) ≤ 1
o∈O
• Optimization
too high dimensional i.e.
λ ∈ R|O|(7+dimL)
• Alternative optimization
while convergence
for o ∈ O
X
o
o
o
o
min
log[π(Φ (x))Pf (x) + (1 − π(Φ (x)))Pb (x)]
λ
x∈Ω
s.t. π(Φo) ≤
X
• Occluding
π(Φo0)
(7+dimL)
∈R
, |O| times
• Non-intersection→
background for other objects
while convergence
for o ∈ O
Pbo(x) ← µπ(Φo0) Pfo(x) ← 1 − µπ(Φo0)
X
min
λ
log[π(Φ
o
o
(x))Pf (x)
object affect the pose of the occluded
object
• Estimate better pose through object relationship
o0∈O\{o}
•λ
Conclusion
+ (1 − π(Φ
o
o
(x)))Pb (x)]
x∈Ω
• Using
only the non-occluded parts to estimate the
pose .7 ≥ µ ≥ .5
Discussion
• Evaluation metric
• segmentation overlap : VOC 2012 segmentation
• pose ground truth label and L2 distance as error metric
• Automatic initialization
• DPM-style rough pose estimation
• Better segmentation for complicated scene
• 3-D
Results
non-intersection constraint using B.B.
• Dynamic segmentation using 3D model i.e.
dynamic graph cut
• Supporting plane → depth and scale
References
[1] A. Dame, V. A. Prisacariu, C. Y. Ren and I. D.
Reid, "Dense Reconstruction Using 3D Object
Shape Priors". CVPR, 2013
[2] Felzenszwalb, P. F. and Girshick, R. B. and
McAllester, D. and Ramanan, D. "Object
Detection with Discriminatively Trained Part
Based Models", PAMI 2010