This work is funded by 3D ShapeNets: A Deep Representation for Volumetric Shapes Zhirong Wu Motivation: 3D Shape Representation 4000 Desirable Properties of a Good 3D Shape Representation: L2 13 48 filters of stride 2 6 3D Deep Learning 1. Convert 3D ShapeNets to a 3D CNN 2. Feature learning by back-propagation 3. Test for classification and retrieval 5 fully connected RBM multinomial label + Bernoulli feature forms an associate memory Learning Contrastive Divergence 10 classes FODVVLÀFDWLRQ UHWULeval AUC UHWULeval MAP 40 classes FODVVLÀFDWLRQ UHWULeval AUC UHWULeval MAP Contrastive Divergence Fast Persistent Contrastive Divergence Princeton ModelNet Dataset A large-scale CAD dataset of more than 150,000 models in 660 categories. SPH [18] 79.79 % 45.97% 44.05% SPH [18] 68.23% 34.47% 33.26% 10 Classes Results 0.9 0.9 0.8 0.8 0.7 0.7 bathtub bed chair desk Precision [29] Depth NN ICP 3D ShapeNets 3D ShapeNets ÀQHWXQHG [29] RGB [29] RGBD Depth map from back of a sofa Spherical Harmonic Light Field Our 5th layer finetuned bathtub 0.000 0.429 0.571 0.142 0.857 0.142 0.000 bed 0.729 0.446 0.608 0.500 0.703 0.743 0.743 0.4 0.4 observed surface unknown newly visible surface free space potentially visible voxels in next view original surface three different next-view candidates Spherical Harmonic Light Field Our 5th layer finetuned 0 0.1 0.2 0.3 0.4 monitor monitor stand sofa sofa toilet table table toilet 0.5 0.6 Recall 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 Precision Recall curve 0.4 0.5 0.6 Recall 0.7 0.8 0.9 Volumetric representation dresser 0.466 0.467 0.733 0.366 0.500 0.266 0.466 monitor 0.222 0.333 0.389 0.500 0.500 0.166 0.388 nightstand 0.343 0.188 0.438 0.719 0.625 0.218 0.468 sofa 0.481 0.458 0.349 0.277 0.735 0.313 0.602 table 0.415 0.455 0.052 0.377 0.247 0.376 0.441 toilet 0.200 0.400 1.000 0.700 0.400 0.200 0.500 all 0.376 0.374 0.471 0.437 0.579 0.334 0.448 The original entropy of the first view, Given a fixed number of views, the conditional entropy for view, sofa? experiements on synthetic depth for two views Ours Max Visibility Furthest Away Random Selection bathtub? 1 desk 0.100 0.176 0.375 0.100 0.300 0.150 0.175 We choose the view with the maximum mutual information !"#$%&'()(*+# 0.2 0.1 chair 0.806 0.395 0.194 0.685 0.919 0.766 0.693 The reduction of entropy is the mutual information between the new voxel and the label, 0.3 0.3 0.1 dresser Deep View Planning dresser? 0.5 0.2 desk Table 2: Accuracy for VLHZEDVHd 2.5D Recognition on NYU dataset. 0.6 0.5 0 bathtub 3D ShapeNets for 2.5 Depth Recognition 40 Classes Results 1 0.6 Examples of Chairs Ours 83.54% 69.28% 68.26% Ours 77.32% 49.94% 49.23% Table 1: Shape &ODVVLÀFDWLRQ DQd 5HWULeval 5HVXOWV 1 Object Categories LFD [8] 79.87 % 51.70% 49.82% LFD [8] 75.47% 42.04% 40.91% results on synthetic depth 3D Shape Generation Precision 4 L1 3D Feature Extraction Configuration of 3D ShapeNets Type convolutional RBM 30 3D voxel input • 3D ShapeNets (a Convolutional Deep Belief Network) learns the joint distribution of generic 3D shapes across object category. • Meaningful volumetric shapes could be sampled from the model. Recognition and Reconstruction on NYU depth Training only uses CAD models! Infer unknown space and label jointly by gibbs sampling Shape Completion 5 4 • Data-driven: learn from data rather than predefined shape routines. • Generic: any complex shapes rather than simple shape primitives. • Compositional: compose complex shapes by assembling simple ones. • Versatile: applicable to various vision tasks. Layer 1-3 Identify free space, observed surface, unknow space full 3D full 3D Philbin et al. 2007 L4 free space unknown observed surface observed points completed surface RGB 512 filters of stride 1 5 Girshick et al. 2014 Convert the depth map into volumetric representation L3 160 filters of stride 2 Binford et al. 1989 L5 1200 2 It is a chair! depth Felzenszwalb et al. 2010 Recognition and Completion chair Biederman 1987 Jianxiong Xiao Inference Process object label 10 Rothganger et al. 2006 Xiaoou Tang Code and Data: http://3DShapeNets.cs.princeton.edu Limited to Instance Level Matching Dalal & Triggs 2005 Linguang Zhang nightstand table desk bed No State-of-the-arts Use 3D Shapes Aditya Khosla Fisher Yu monitor dresser toilet bathtub sofa Only in Theory Hand Craft Design Shuran Song Shape completion Next-Best-View Recognition 3 possible shapes predicted new freespace & visible surface bathtub 0.80 0.85 0.65 0.60 bed 1.00 0.85 0.85 0.80 chair 0.85 0.85 0.75 0.75 desk 0.50 0.50 0.55 0.50 dresser 0.45 0.45 0.25 0.45 monitor 0.85 0.85 0.85 0.90 nightstand 0.75 0.75 0.65 0.70 sofa 0.85 0.85 0.50 0.65 table 0.95 0.90 1.00 0.90 toilet 1.00 0.95 0.85 0.90 all 0.80 0.78 0.69 0.72 Table 3: Comparison of Different Next-Best-View Selections Based on Recognition Accuracy from Two Views. Based on an algorithm’s choice, we obtain the actual depth map for the next view and recognize the object using those two views in our 3D ShapeNets representation.
© Copyright 2024