3D Conv Nets for Robotic Manipulation Chad DeChant, Joaquín Ruales, Jake Varley Outline - 3D Spatial CNNs - Extended 2D to 3D - Grasping - Heatmap Approach - Shape Completion Approach - Comparison 2D Conv Filters * * 3D Conv Filters * x filter * y filter * z filter 3D Filters in Theano - We Implemented the following for the Keras Theano library - 3DConvolution - Theano conv3d2d (BZCXY format) - 3DMax pooling - Created max_pool_3d using max_pool_2d 2D Conv Sum of 1D Convs 3D Spatial CNN Test Data - Random size geometric shapes sphere axis-aligned cube rotated cube Cost Function Sphere Fully connected logistic layer Sphere Max Pooling Target output Conv layers (16 3x3x3, relu) Max Pooling Conv layers (8 5x5x5, relu) INPUT (28x28x28) Trained Layer 0 Filters 1 2 3 4 5 6 7 8 8 5x5x5 filters Grasping Heatmaps Grasping Heatmaps Palm Finger 1 Finger 2 Finger 3 Grasping Heatmaps Patch: 32x32x32 Label: 1x1x1x nheatmaps Trained Layer 0 Filters 1 16x16x16 filter Another Approach... Shape Completion Complete the shape, then plan grasps on it Shape Completion ● Is shape completion feasible? ● Use MNIST Cost function Digit Completion Reconstruction layer Hidden layer 150 or 1000 units Convolutional layers (70 3x3x3) Convolutional layers (30 3x3x3) INPUT Target output Digit Completion Digit Completion Digit Completion Digit Completion % error Per pixel test error rate by hidden layer size x x x x # of epochs 50 units 150 units 1000 units 3D digit completion MNIST in three dimensions Cost function Reconstruction layer Hidden layer 500 or 1000 units Hidden layer 500 or 1000 units Convolutional layers (70 3x3x3) Convolutional layers (30 3x3x3) INPUT Target output 3D Digit Completion Model Output Target Output 3D Digit Completion Model Output Target Output 3D Digit Completion % error Per pixel test error rate by size of 2 hidden layers 500 units # of epochs 1000 units Synthetic shape completion Synthetic shape completion Synthetic shape completion Synthetic shape completion Synthetic shape completion Synthetic shape completion Training Data - Household objects, Modelnet, BigBird Shape Completion Data Voxelization 3D Shape completion Model Input Model Output Target Output 3D Shape completion Model Input Model Output Target Output Grasp Comparisons Heatmap vs Shape Completion Grasps 1) generate grasps 2) compare ground truth grasp qualities Conclusion - 3D Spatial CNNs - Extended 2d to 3d - Keras Library - Grasping - Heatmap Approach - Shape Completion Approach - Comparison Thank you
© Copyright 2025