Convolution Neural Network CNN

Convolution Neural Network
CNN
A tutorial
KH Wong
Convolution Neural Network CNN ver. 4.11a
1
Introduction
• Very Popular:
– Toolboxes: cuda-convnet and caffe (user friendlier)
• A high performance Classifier (multi-class)
• Successful in handwritten optical character
OCR recognition, speech recognition, image
noise removal etc.
• Easy to implementation
– Slow in learning
– Fast in classification
Convolution Neural Network CNN ver. 4.11a
2
Overview of this note
• Part 1: Fully connected Back Propagation
Neural Networks (BPNN)
– Part 1A: feed forward processing
– Part 1A: feed backward processing
• Part 2: Convolution neural networks (CNN)
– Part 2A: feed forward of CNN
– Part 2B: feed backward of CNN
Convolution Neural Network CNN ver. 4.11a
3
Part 1
Fully Connected Back Propagation
(BP) neural net
Convolution Neural Network CNN ver. 4.11a
4
Theory
Fully connected Back Propagation Neural Net (BPNN)
• Use many samples to train the weights, so it
can be used to classify an unknown input into
different classes
• Will explain
– How to use it after training: forward pass
– How to train it: how to train the weights and
biases (using forward and backward passes)
Convolution Neural Network CNN ver. 4.11a
5
Training
• How to train it: how to train the weights (W)
and biases (b) (use forward, backward passes)
• Initialize W and b randomly
• Iter=1: all_epocks (each is called an epcok)
– Forward pass for each output neuron:
• Use training samples: Xclass_t : feed forward to find y.
• Err=error_function(y-t)
– Backward pass:
• Find W and b to reduce Err.
• Wnew=Wold+W; bnew=bold+b
Convolution Neural Network CNN ver. 4.11a
6
Part 1A
Forward pass of Back Propagation
Neural Net (BPNN)
Recall:
Forward pass for each output
neuron:
-Use training samples: Xclass_t : feed
forward to find y.
-Err=error_function(y-t)
Convolution Neural Network CNN ver. 4.11a
7
Feed forward of Back Propagation Neural Net (BPNN)
x l  f (u l ) with u l  W l x l  b l ,
• In side each neuron: such that
x
l 1
x l 2
wl 2
Inputs
x
l 3
x l  x l , u l  u l ,W l  W l , bl  bl
Typically f is a logistic (sigmod) function, i.e.
1
f (u ) 
, therefore
 u
1 e
1
x l  f (u l ) 
  (W l x l  b l )
1 e
wl 3
Convolution Neural Network CNN ver. 4.11a
xlN
wl  N
Output neurons
8
http://mathworld.wolfram.com/SigmoidFunction.html
Sigmod function f(u) and its derivative f’(u)
•
f (u ) 
1
,  is the paramter for slope
 u
1 e
Hence
 1 
d
 u
 u 
df
(
u
)
df
(
e
)
1

e


'
f (u ) 

du
d (1  e  u )
du


u
u
f ' (u ) 
e

e
(1  e  u ) 2
(1  e  u ) 2
1
  e u

 f (u )1  f (u ) 
 u
 u
(1  e ) (1  e )
For simplicity , paramter for the slope   1
f ' (u )  f (u )1  f (u ) 
Convolution Neural Network CNN ver. 4.11a
http://link.springer.com/chapter/10.1007%2F3-540-59497-3_175#page-1
9
A single neuron
• The neural net can have many layers
• In between any neighboring 2 layers, a set of
neurons can be found
Each Neuron
x l 1 (1) W l (1)
x l 1 (2) W l (2)
x l 1  inputs at layer l  1
x  x, Wl  weights, w  W
x l  f (ul ) with u l  W l x l 1  bl

ul
 
f ul
xl
xl  input at layer l
Convolution Neural Network CNN ver. 4.11a
10
BPNN Forward pass
• Forward pass is to find output when an input is given. For
example:
• Assume we have used N=60,000 images to train a network to
recognize c=10 numerals.
• When an unknown image is input, the output neuron
corresponds to the correct answer will give the highest output
level.
Input
image
10 output neurons for 0,1,2,..,9
Convolution Neural Network CNN ver. 4.11a
11
The criteria to train a network
• Is based on the overall error function
•
2
N c
1
Overall error E N   tkn  ykn 
2 n 1 k 1
2
1
1
Error for each neuron : E   tkn  ykn   t n  y n
2 k 1
2
c
n
2
2
 2 s  norm;
tkn  The given true class of the n th training sample
ykn  The output class of the n th training sample
at the ouput of the feed forward network
Convolution Neural Network CNN ver. 4.11a
12
Structure of a BP neural network
•
x l 1
W l  weights
b l  biases
xl
f ()
Input hidden
layer
layer l  1
hidden
layer l
output
layer
x  set of inputs, W  set of weights, b  set of biases
such that x l  x l , u l  u l ,W l  W l , bl  bl
Convolution Neural Network CNN ver. 4.11a
13
Architecture (exercise: write formulas for A1(i=4) and A2(k=3)
A1 
P(j=1)
1
1
A

2
1  e (W2 (i 1,k 1) A1 ( k 1) W2 (i  2,k 1) A2 ( k 1) ...  b 2 ( k 1))
1  e W1 ( j 1,i 1) P1 W1 ( j  2,i 1) P2 ...  b1 (i 1) 
A1
W2(i=1,k=1)
Neuron i=1
A2
Bias=b1(i=1)
A1(i=1)
A5
W2(i=2,k=1)
W1(j=1,i=1)
P(j=2)
W1(j=2,i=1)
P(j=9)
W1(j=9,i=1)
P(j=1)
P(j=2)
W1(j=1,i=1)
Neuron k=1
Bias=b2(k=1)
W2(i=5,k=1)
A2(k=2)
A1(i=1)
W2(i=1,k=1)
A1(i=2) W2(i=2,k=1)
W1(j=2,i=1)
W2(i=2,k=2)
P(j=3)
W1(j=3,i=4)
:
:
W1(j=9,i=5)
P(j=9) Input:
P=9x1
Indexed by j
W2(i=5,k=3)
Output
neurons=3
neurons,
indexed by k
W2=5x3
b2=3x1
A1(i=5)
Hidden layer =5
neurons,
indexed by i
W1=9x5
Convolution Neural Network CNN ver. 4.11a
b1=5x1
•
14
Answer (exercise: write values for
A1(i=4) and A2(k=3)
• P=[ 0.7656 0.7344 0.9609 0.9961
0.9141 0.9063 0.0977 0.0938 0.0859]
• W1=[ 0.2112 0.1540 -0.0687 -0.0289
0.0720 -0.1666 0.2938 -0.0169 -0.1127]
• -b1= 0.1441
• %Find A1(i=4)
• A1_i_is_4=1/(1+exp[-(W1*P+b1))]
• =0.49
A1 (i  4) 
1
1 e
Convolution
Neural Network CNN ver. 4.11a
W
1 ( j 1,i  4 ) P1 W1 ( j  2 ,i  4 ) P2 ...  b1 (i 1) 
15
Numerical example for the forward
path
• Feed forward
• Give numbers of x, w b etc
Convolution Neural Network CNN ver. 4.11a
16
Example: a simple BPNN
•
•
•
•
•
Number of classes (no. of output neurons)=3
Input 9 pixels: each input is a 3x3 image
Training samples =3 for each class
Number of hidden layers =1
Number of neurons in the hidden layer =5
Convolution Neural Network CNN ver. 4.11a
17
Architecture of the example
x
xl
hidden
Input
Layer
9x1 pixels
layer
W  5x9
b  5x1
W l  weights
b  biases
l
f Convolution
() Neural Network CNN ver. 4.11a
output
Layer 3x1
•
18
Part 1B
Backward pass of Back Propagation
Neural Net (BPNN)
Convolution Neural Network CNN ver. 4.11a
19
feedback
layer l
•
x
Feed
forward
Feed
backward
  W
l
x l 1
l
x l 1  f ( wx  b)


l 1 T
 l 1
l
l 1
  
f u W
'
l

l 1 T
Convolution Neural Network CNN ver. 4.11a
l 1
 
 
 f ul 1 f ul
20
derivation
since u l  W l x l 1  b l , so
•
u
 1    (i ),
b
E E u

   the sensitivit es    (ii )
b u b
2
2
1
1
Since the n - th sample E n  t n  y n  t n  f (u )    (iii )
2
2
, since y n  f (u ) is the current output , t n is the truth or target
hence






n
n
E n
n
n  t  y
From (ii) & (iii),  
 t y
b
b
n
E n
u
l
n
n  t  f (u )
 
 t y
 t n  y n  f ' u 
,
b
b
b
u
since in (i ),
 1,
b
E n
l
 
 y n  t n f ' u     (iv )
b
at the output layer L
l





 


 L  f ' u L   y n  t n 
Convolution Neural Network CNN ver. 4.11a
21
derivation
1 n
n 2
Also from (iii) , E  t  y 
2
E
y n 
f (u ) 
wx  b
n
n 
n
n 
n
n









t

y


t

y


y

t
f
'
(
u
)

l 
l
 W l 
W l

W

W




n
•
  y n  t n  f ' (u ) x, since in (iv)  l    y n  t n  f ' (u )
 x  
For each learning phase, a new W is calculated
l 1
l T
Wnew  Wold  W l , if we want to decease W for everfy learning clcycle
make W l negative so make W l  
a learning factor 
hence
E
W l  
W l
E
, to do it slowly use
l
W
Convolution Neural Network CNN ver. 4.11a
22
Numerical example for the feed
back pass
Convolution Neural Network CNN ver. 4.11a
23
Procedure
• From the last layer (output), find dt-y
• Find d, then find w of the whole network
• Find iterative (forward- back forward pass) to
generate a new set of W, until dW is small
• Takes a long time
Convolution Neural Network CNN ver. 4.11a
24
Part 2
Convolution Neural Networks
Part 2A
Feed forward part of
cnnff( )
Matlab example
http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
Convolution Neural Network CNN ver. 4.11a
25
An example optical chartered
recognition OCR
• Example test_example_CNN.m in
http://www.mathworks.com/matlabcentral
/fileexchange/38310-deep-learning-toolbox
• Based on a data base (mnist_uint8, from
http://yann.lecun.com/exdb/mnist/)
• 60,000 training examples (28x28 pixels
each)
• 10,000 testing samples (a different
dat.2set)
– After training , given an unknown image, it
will tell whether it is 0, or 1 ,..,9 etc.
– Recognition rate 11% use 1 epoch (training
200seconds)
http://andrew.gibiansky.com/blog
– Recognition rate 1.2% use 100 epochs
/machine-learning/k-nearest(hours of training)
neighbors-simplest-machinelearning/
Convolution Neural Network CNN ver. 4.11a
26
Overview of
Test_example_CNN.m
• Read data base
• Part I:
• cnnsetup.m
–
–
–
–
–
Layer 1: input layer (do nothing)
Layer 2 convolution(conv.) Layer, output maps=6, kernel size=5x5
Layer 3 sub-sample (subs.) Layer, scale=2
Layer 4 conv. Layer, output maps =12, kernel size=5x5
Layer 5 subs. Layer (output layer), scale =2
• Part 2:
• cnntrain.m % train wedihgts using 60,000 samples
– cnnff( ) % CNN feed forward
– cnndb( ) % CNN feed back to train weighted in kernels
– cnnapplygrads( ) % update weights
• cnntest.m % test the system using 10000 samples and show error rate
Convolution Neural Network CNN ver. 4.11a
27
Architecture
Layer 12:
Layer 1:
One input 6 conv.Maps (C)
InputMaps=6
(I)
OutputMaps=6
Fan_in=52=25
Fan_out=6x52=
150
Layer 1:
Image
Layer 2:
Input
6x24x24
1x28x28
Kernel
=5x5
Layer 23:
6 sub-sample
Map (S)
InputMaps=6
OutputMaps=
12
Layer 3: Layer 4:
6x12x12 12x8x8
Conv.
I=input
C=Conv.=convolution
S=Subs=sub sampling
Layer 34:
12 conv.
Maps (C)
InputMaps=6
OutputMaps
=12
Fan_in=
6x52=150
Fan_out=
12x52=300
Subs
2x2
Layer 45:
12 sub-sample
Map (S)
InputMaps=12
OutputMaps=12
Each output
neuron
corresponds
to a
character
(0,1,2,..,9
etc.)
Layer 5:
12x4x4
Kernel
=5x5 Conv.
Subs
2x2
Convolution Neural Network CNN ver. 4.11a
10
28
•
Cnnff.m
convolution neural networks feed forward
• This is the feed forward part
• Assume all the weights are initialized or
calculated, we show how to get the output
from inputs.
Convolution Neural Network CNN ver. 4.11a
29
•
Layer 12:
Layer 12:
Layer 1:
One input 6 conv.Maps (C)
InputMaps=6
(I)
OutputMaps=6
Fan_in=52=25
Fan_out=6x52=
150
Layer 1:
Image
Layer 2(c):
Input (i)
6x24x24 Map_index=
1x28x28
1
i
2
Conv.*K(1)
:
6
j
Kernel
=5x5
Conv.*K(6)
I=input
C=Conv.=convolution
S=Subs=sub sampling
2x2
•
Convolute layer 1 with different kernels
(map_index1=1,2,.,6) and produce 6
output maps
Inputs :
•
•
input layer 1, a 28x28 image
6 different kernels : k(1),.,,,k(6) , each k is
5x5, K are dendrites of neurons
•
Output : 6 output maps each 24x24
•
•
•
•
•
•
•
•
Algorithm
For(map_index=1:6)
{
layer_2(map_index)=
I*k(map_index)valid
}
Discussion
Valid means only consider overlapped
areas, so if layer 1 is 28x28, kernel is 5x5
each, each output map is 24x24
In Matlab > use convn(I,k,’valid’)
Example:
I=rand(28,28)
k=rand(5,5)
size(convn(I,k,’valid’))
> ans
> 24 24
•
•
•
•
•
•
•
Convolution Neural Network CNN ver. 4.11a
30
Layer 23:
• Sub-sample layer 2 to layer 3
• Inputs :
• 6 maps of layer 2, each is
24x24
Layer 23:
6 sub-sample
Map (S)
InputMaps=6
OutputMaps=
12
Layer 2 (c):
6x24x24
• Output : 6 maps of layer 3,
each is 12 x12
Layer 3 (s):
6x12x12
Subs
2x2
Map_index=
1
2
:
6
•
•
•
•
Algorithm
For(map_index=1:6)
{
For each input map, calculate
the average of 2x2 pixels and
the result is saved in output
maps.
• Hence resolution is reduced
from 24x24 to 12x12
• }
• Discussion
Convolution Neural Network CNN ver. 4.11a
31
•
Layer
34:
Layer 34:
12 conv.
Maps (C)
InputMaps=6
OutputMaps=12
Fan_in=
6x52=150
Fan_out=
12x52=300
Layer3 L3(s):
6x12x12
Index=i=1:6
•
•
•
each is 8x8
•
Layer 4(c): net.layers{l}.a{j} •
12x8x8
•
Index=j=1:12
•
:
Kernel
=5x5
Conv. layer 3 with kernels to produce layer
4
Inputs :
• 6 maps of layer3(L3{i=1:6}), each is
12x12
• Kernel set: totally 6x12 kernels, each is
5x5,i.e.
• K{i=1:6}{j=1:12}, each K{i}{j} is 5x5
• 12 bias{j=1:12} in this layer, each is a
scalar
Output : 12 maps of layer4(L4{j=1:12}),
•
•
•
•
•
•
•
•
Algorithm
for(j=1:12)
{for (i=1:6)
{clear z, i.e. z=0;
z=z+covn (L3{i}, k{i}{j},’valid’)] %z is 8x8
}
L4{j}=sigm(z+bais{j}) %L4{j} is 8x8
}
function X = sigm(P)
X = 1./(1+exp(-P));
End
Discussion
–
Normalization?
•
32
Convolution Neural Network CNN ver. 4.11a
Layer
45
Layer 45:
• Subsample layer 4 to layer
5
• Inputs :
• 12 maps of
layer4(L4{i=1:12}), each
is 12x8x8
• Output : 12 maps of
12 sub-sample
Map (S)
InputMaps=12
OutputMaps=12
Layer 4: Layer 5:
12x8x8
12x4x4
layer5(L5{j=1:12}), each
is 4x4
Subs
2x2
• Algorithm
• Sub sample each 2x2 pixel
window in L4 to a pixel in
L5
• Discussion
– Normalization?
Convolution Neural Network CNN ver. 4.11a
10
33
•
Layer
5output
Layer 45:
Totally
192
weights
for each
output
neuron
12 sub-sample
Map (S)
InputMaps=12
OutputMaps=12
Layer 5 (L5{j=1:12}:
12x4x4=192
Totally 192 pixels
•
•
•
Each output
neuron
corresponds to
a character
(0,1,2,..,9 etc.)
net.o{m=1:10} •
:
:
Same for each output neuron
Subsample layer 4 to layer 5
Inputs :
•
12 maps of layer5(L5{i=1:12}),
each is 4x4, so L5 has 192 pixels
in total
Output layer weights:
Net.ffW{m=1:10}{p=1:192},
total number of weights is 192
Output : 10 output neurons
(net.o{m=1:10})
•
Algorithm
•
•
•
•
•
For m=1:10%each output neuron
{clear net.fv
net.fv=Net.ffW{m}{all 192
weight}.*L5(all corresponding 192
pixels)
net.o{m}=sign(net.fv + bias)
}
•
Discussion
10
Convolution Neural Network CNN ver. 4.11a
34
•
Part 2B
Back propagation part
cnnbp( )
cnnapplyweight( )
Convolution Neural Network CNN ver. 4.11a
35
cnnbp( )
overview
(output
back
to
layer
5
E
wi
 ( y  t ) y (1  y ) xi
in _ cnnbp.m
out.o  y
net.e  ( y  t )
E
 ( y  t ) y (1  y ) xi wi
xi
E 1
 net.od  net.e . * (net.o . * (1 - net.o))
xi wi
E
 net.od * wi  net.e . * (net.o . * (1 - net.o)) * wi
xi
so in code cnnbp.m
E
 net.fvd  (net.ffW' * net.od)
xi
Ref: See http://en.wikipedia.org/wiki/Backpropagation
•
Convolution Neural Network CNN ver. 4.11a
36
Layer 5 to 4
• Expand 1x1 to 2x2
Convolution Neural Network CNN ver. 4.11a
37
Layer 4 to 3
• Rotated convolution
• Find dE/dx at layer 3
Convolution Neural Network CNN ver. 4.11a
38
Layer 3 to 2
• Expand 1x1 to 2x2
Convolution Neural Network CNN ver. 4.11a
39
Calculate gradient
•
•
•
•
From later 2 to layer 3
From later 3 to layer 4
Net.ffW
Net.ffb found
Convolution Neural Network CNN ver. 4.11a
40
Details of calc gradients
•
•
•
•
•
•
•
•
•
% part % reshape feature vector deltas into output map style
L4(c) run expand only
L3(s) run conv (rot180, fill), found d
L2(c) run expand only
%Part %% calc gradients
L2(c) run conv (valid), found dk and db
L3(s) not run here
L4(c) run conv(valid), found dk and db
Done , found these for the output layer L5:
– net.dffW = net.od * (net.fv)' / size(net.od, 2);
– net.dffb = mean(net.od, 2);
Convolution Neural Network CNN ver. 4.11a
41
cnnapplygrads(net, opts)
• For the convolution layers, L2, L4
– From k and dk find new k (weights)
– From b and db find new b (bias)
• For the output layer L5
– net.ffW = net.ffW - opts.alpha * net.dffW;
– net.ffb = net.ffb - opts.alpha * net.dffb;
– opts.alpha is to adjust learning rate
Convolution Neural Network CNN ver. 4.11a
42
appendix
•
Convolution Neural Network CNN ver. 4.11a
43
Architecture
Layer 12:
Layer 1:
One input 6 conv.Maps (C)
InputMaps=6
(I)
OutputMaps=6
Fan_in=52=25
Fan_out=6x52=
150
Layer 1:
Image
Layer 2:
Input
6x24x24
1x28x28
i
Layer 23:
6 sub-sample
Map (S)
InputMaps=6
OutputMaps=
12
Layer 34:
12 conv.
Maps (C)
InputMaps=6
OutputMaps
=12
Fan_in=
6x52=150
Fan_out=
12x52=300
Layer 3: Layer 4:
6x12x12 12x8x8
Layer 45:
12 sub-sample
Map (S)
InputMaps=12
OutputMaps=12
Each output
neuron
corresponds
to a
character
(0,1,2,..,9
etc.)
Layer 5:
12x4x4
u
v
Conv.
Kernel
=5x5
I=input
C=Conv.=convolution
S=Subs=sub sampling
Subs
2x2
Kernel
=5x5 Conv.
Subs
2x2
Convolution Neural Network CNN ver. 4.11a
10
j
44
•
A single neuron
• The neural net has many layers
• In between any neighboring 2 layers, a set of
neurons can be found
Each Neuron
x l 1 (1) W l (1) x l  f (u l ) with u l  W l x l 1  bl
x l 1 (2) W l (2)
ul
l
l
x
f
u

 
x l 1  inputs at layer l  1
x  x, W l  weights, w  W
xl  input at layer l
Convolution Neural Network CNN ver. 4.11a
45
Derivation
• dE/dW=changes at layer l+1 by changes in
layer l
l
l 1 T l 1
l






W


f
'
u
• At output layer L
• dE/db=d
at output layer L
• E=f(wx+b)
L
L
n
n
  f ' u    y  t 
• dE/db=d
Convolution Neural Network CNN ver. 4.11a
46
References
• Wiki
– http://en.wikipedia.org/wiki/Convolutional_neura
l_network
– http://en.wikipedia.org/wiki/Backpropagation
• Matlab programs
– Neural Network for pattern recognition- Tutorial
http://www.mathworks.com/matlabcentral/fileex
change/19997-neural-network-for-patternrecognition-tutorial
– CNN Matlab example
http://www.mathworks.com/matlabcentral/fileex
Convolution Neural Network CNN ver. 4.11a
47
change/38310-deep-learning-toolbox