3D CNNs for Robotic Grasp Stability Estimation

3D Conv Nets
for Robotic Manipulation
Chad DeChant, Joaquín Ruales, Jake Varley
Outline
- 3D Spatial CNNs
- Extended 2D to 3D
- Grasping
- Heatmap Approach
- Shape Completion Approach
- Comparison
2D Conv Filters
*
*
3D Conv Filters
*
x filter
*
y filter
*
z filter
3D Filters in Theano
- We Implemented the following for the Keras
Theano library
- 3DConvolution
- Theano conv3d2d (BZCXY format)
- 3DMax pooling
- Created max_pool_3d using max_pool_2d
2D Conv
Sum of 1D Convs
3D Spatial CNN Test Data
- Random size geometric shapes
sphere
axis-aligned cube
rotated cube
Cost Function
Sphere
Fully connected logistic layer
Sphere
Max Pooling
Target output
Conv layers (16 3x3x3, relu)
Max Pooling
Conv layers (8 5x5x5, relu)
INPUT (28x28x28)
Trained Layer 0 Filters
1 2 3 4 5 6 7 8
8 5x5x5 filters
Grasping Heatmaps
Grasping Heatmaps
Palm
Finger 1
Finger 2
Finger 3
Grasping Heatmaps
Patch: 32x32x32
Label: 1x1x1x nheatmaps
Trained Layer 0 Filters
1 16x16x16
filter
Another Approach...
Shape Completion
Complete the shape, then plan grasps on it
Shape Completion
● Is shape completion feasible?
● Use MNIST
Cost function
Digit
Completion
Reconstruction layer
Hidden layer 150 or 1000 units
Convolutional layers (70 3x3x3)
Convolutional layers (30 3x3x3)
INPUT
Target output
Digit Completion
Digit Completion
Digit Completion
Digit Completion
% error
Per pixel test error rate by hidden layer size
x
x
x
x
# of epochs
50 units
150 units
1000 units
3D digit completion
MNIST in three dimensions
Cost function
Reconstruction layer
Hidden layer 500 or 1000 units
Hidden layer 500 or 1000 units
Convolutional layers (70 3x3x3)
Convolutional layers (30 3x3x3)
INPUT
Target output
3D Digit Completion
Model Output
Target Output
3D Digit Completion
Model Output
Target Output
3D Digit Completion
% error
Per pixel test error rate by size of 2 hidden layers
500 units
# of epochs
1000 units
Synthetic shape completion
Synthetic shape completion
Synthetic shape completion
Synthetic shape completion
Synthetic shape completion
Synthetic shape completion
Training Data
- Household objects, Modelnet, BigBird
Shape Completion Data
Voxelization
3D Shape completion
Model Input
Model Output
Target Output
3D Shape completion
Model Input
Model Output
Target Output
Grasp Comparisons
Heatmap vs Shape Completion Grasps
1) generate grasps
2) compare ground truth grasp qualities
Conclusion
- 3D Spatial CNNs
- Extended 2d to 3d
- Keras Library
- Grasping
- Heatmap Approach
- Shape Completion Approach
- Comparison
Thank you