Convolution Neural Network CNN A tutorial KH Wong Convolution Neural Network CNN ver. 4.11a 1 Introduction • Very Popular: – Toolboxes: cuda-convnet and caffe (user friendlier) • A high performance Classifier (multi-class) • Successful in handwritten optical character OCR recognition, speech recognition, image noise removal etc. • Easy to implementation – Slow in learning – Fast in classification Convolution Neural Network CNN ver. 4.11a 2 Overview of this note • Part 1: Fully connected Back Propagation Neural Networks (BPNN) – Part 1A: feed forward processing – Part 1A: feed backward processing • Part 2: Convolution neural networks (CNN) – Part 2A: feed forward of CNN – Part 2B: feed backward of CNN Convolution Neural Network CNN ver. 4.11a 3 Part 1 Fully Connected Back Propagation (BP) neural net Convolution Neural Network CNN ver. 4.11a 4 Theory Fully connected Back Propagation Neural Net (BPNN) • Use many samples to train the weights, so it can be used to classify an unknown input into different classes • Will explain – How to use it after training: forward pass – How to train it: how to train the weights and biases (using forward and backward passes) Convolution Neural Network CNN ver. 4.11a 5 Training • How to train it: how to train the weights (W) and biases (b) (use forward, backward passes) • Initialize W and b randomly • Iter=1: all_epocks (each is called an epcok) – Forward pass for each output neuron: • Use training samples: Xclass_t : feed forward to find y. • Err=error_function(y-t) – Backward pass: • Find W and b to reduce Err. • Wnew=Wold+W; bnew=bold+b Convolution Neural Network CNN ver. 4.11a 6 Part 1A Forward pass of Back Propagation Neural Net (BPNN) Recall: Forward pass for each output neuron: -Use training samples: Xclass_t : feed forward to find y. -Err=error_function(y-t) Convolution Neural Network CNN ver. 4.11a 7 Feed forward of Back Propagation Neural Net (BPNN) x l f (u l ) with u l W l x l b l , • In side each neuron: such that x l 1 x l 2 wl 2 Inputs x l 3 x l x l , u l u l ,W l W l , bl bl Typically f is a logistic (sigmod) function, i.e. 1 f (u ) , therefore u 1 e 1 x l f (u l ) (W l x l b l ) 1 e wl 3 Convolution Neural Network CNN ver. 4.11a xlN wl N Output neurons 8 http://mathworld.wolfram.com/SigmoidFunction.html Sigmod function f(u) and its derivative f’(u) • f (u ) 1 , is the paramter for slope u 1 e Hence 1 d u u df ( u ) df ( e ) 1 e ' f (u ) du d (1 e u ) du u u f ' (u ) e e (1 e u ) 2 (1 e u ) 2 1 e u f (u )1 f (u ) u u (1 e ) (1 e ) For simplicity , paramter for the slope 1 f ' (u ) f (u )1 f (u ) Convolution Neural Network CNN ver. 4.11a http://link.springer.com/chapter/10.1007%2F3-540-59497-3_175#page-1 9 A single neuron • The neural net can have many layers • In between any neighboring 2 layers, a set of neurons can be found Each Neuron x l 1 (1) W l (1) x l 1 (2) W l (2) x l 1 inputs at layer l 1 x x, Wl weights, w W x l f (ul ) with u l W l x l 1 bl ul f ul xl xl input at layer l Convolution Neural Network CNN ver. 4.11a 10 BPNN Forward pass • Forward pass is to find output when an input is given. For example: • Assume we have used N=60,000 images to train a network to recognize c=10 numerals. • When an unknown image is input, the output neuron corresponds to the correct answer will give the highest output level. Input image 10 output neurons for 0,1,2,..,9 Convolution Neural Network CNN ver. 4.11a 11 The criteria to train a network • Is based on the overall error function • 2 N c 1 Overall error E N tkn ykn 2 n 1 k 1 2 1 1 Error for each neuron : E tkn ykn t n y n 2 k 1 2 c n 2 2 2 s norm; tkn The given true class of the n th training sample ykn The output class of the n th training sample at the ouput of the feed forward network Convolution Neural Network CNN ver. 4.11a 12 Structure of a BP neural network • x l 1 W l weights b l biases xl f () Input hidden layer layer l 1 hidden layer l output layer x set of inputs, W set of weights, b set of biases such that x l x l , u l u l ,W l W l , bl bl Convolution Neural Network CNN ver. 4.11a 13 Architecture (exercise: write formulas for A1(i=4) and A2(k=3) A1 P(j=1) 1 1 A 2 1 e (W2 (i 1,k 1) A1 ( k 1) W2 (i 2,k 1) A2 ( k 1) ... b 2 ( k 1)) 1 e W1 ( j 1,i 1) P1 W1 ( j 2,i 1) P2 ... b1 (i 1) A1 W2(i=1,k=1) Neuron i=1 A2 Bias=b1(i=1) A1(i=1) A5 W2(i=2,k=1) W1(j=1,i=1) P(j=2) W1(j=2,i=1) P(j=9) W1(j=9,i=1) P(j=1) P(j=2) W1(j=1,i=1) Neuron k=1 Bias=b2(k=1) W2(i=5,k=1) A2(k=2) A1(i=1) W2(i=1,k=1) A1(i=2) W2(i=2,k=1) W1(j=2,i=1) W2(i=2,k=2) P(j=3) W1(j=3,i=4) : : W1(j=9,i=5) P(j=9) Input: P=9x1 Indexed by j W2(i=5,k=3) Output neurons=3 neurons, indexed by k W2=5x3 b2=3x1 A1(i=5) Hidden layer =5 neurons, indexed by i W1=9x5 Convolution Neural Network CNN ver. 4.11a b1=5x1 • 14 Answer (exercise: write values for A1(i=4) and A2(k=3) • P=[ 0.7656 0.7344 0.9609 0.9961 0.9141 0.9063 0.0977 0.0938 0.0859] • W1=[ 0.2112 0.1540 -0.0687 -0.0289 0.0720 -0.1666 0.2938 -0.0169 -0.1127] • -b1= 0.1441 • %Find A1(i=4) • A1_i_is_4=1/(1+exp[-(W1*P+b1))] • =0.49 A1 (i 4) 1 1 e Convolution Neural Network CNN ver. 4.11a W 1 ( j 1,i 4 ) P1 W1 ( j 2 ,i 4 ) P2 ... b1 (i 1) 15 Numerical example for the forward path • Feed forward • Give numbers of x, w b etc Convolution Neural Network CNN ver. 4.11a 16 Example: a simple BPNN • • • • • Number of classes (no. of output neurons)=3 Input 9 pixels: each input is a 3x3 image Training samples =3 for each class Number of hidden layers =1 Number of neurons in the hidden layer =5 Convolution Neural Network CNN ver. 4.11a 17 Architecture of the example x xl hidden Input Layer 9x1 pixels layer W 5x9 b 5x1 W l weights b biases l f Convolution () Neural Network CNN ver. 4.11a output Layer 3x1 • 18 Part 1B Backward pass of Back Propagation Neural Net (BPNN) Convolution Neural Network CNN ver. 4.11a 19 feedback layer l • x Feed forward Feed backward W l x l 1 l x l 1 f ( wx b) l 1 T l 1 l l 1 f u W ' l l 1 T Convolution Neural Network CNN ver. 4.11a l 1 f ul 1 f ul 20 derivation since u l W l x l 1 b l , so • u 1 (i ), b E E u the sensitivit es (ii ) b u b 2 2 1 1 Since the n - th sample E n t n y n t n f (u ) (iii ) 2 2 , since y n f (u ) is the current output , t n is the truth or target hence n n E n n n t y From (ii) & (iii), t y b b n E n u l n n t f (u ) t y t n y n f ' u , b b b u since in (i ), 1, b E n l y n t n f ' u (iv ) b at the output layer L l L f ' u L y n t n Convolution Neural Network CNN ver. 4.11a 21 derivation 1 n n 2 Also from (iii) , E t y 2 E y n f (u ) wx b n n n n n n t y t y y t f ' ( u ) l l W l W l W W n • y n t n f ' (u ) x, since in (iv) l y n t n f ' (u ) x For each learning phase, a new W is calculated l 1 l T Wnew Wold W l , if we want to decease W for everfy learning clcycle make W l negative so make W l a learning factor hence E W l W l E , to do it slowly use l W Convolution Neural Network CNN ver. 4.11a 22 Numerical example for the feed back pass Convolution Neural Network CNN ver. 4.11a 23 Procedure • From the last layer (output), find dt-y • Find d, then find w of the whole network • Find iterative (forward- back forward pass) to generate a new set of W, until dW is small • Takes a long time Convolution Neural Network CNN ver. 4.11a 24 Part 2 Convolution Neural Networks Part 2A Feed forward part of cnnff( ) Matlab example http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox Convolution Neural Network CNN ver. 4.11a 25 An example optical chartered recognition OCR • Example test_example_CNN.m in http://www.mathworks.com/matlabcentral /fileexchange/38310-deep-learning-toolbox • Based on a data base (mnist_uint8, from http://yann.lecun.com/exdb/mnist/) • 60,000 training examples (28x28 pixels each) • 10,000 testing samples (a different dat.2set) – After training , given an unknown image, it will tell whether it is 0, or 1 ,..,9 etc. – Recognition rate 11% use 1 epoch (training 200seconds) http://andrew.gibiansky.com/blog – Recognition rate 1.2% use 100 epochs /machine-learning/k-nearest(hours of training) neighbors-simplest-machinelearning/ Convolution Neural Network CNN ver. 4.11a 26 Overview of Test_example_CNN.m • Read data base • Part I: • cnnsetup.m – – – – – Layer 1: input layer (do nothing) Layer 2 convolution(conv.) Layer, output maps=6, kernel size=5x5 Layer 3 sub-sample (subs.) Layer, scale=2 Layer 4 conv. Layer, output maps =12, kernel size=5x5 Layer 5 subs. Layer (output layer), scale =2 • Part 2: • cnntrain.m % train wedihgts using 60,000 samples – cnnff( ) % CNN feed forward – cnndb( ) % CNN feed back to train weighted in kernels – cnnapplygrads( ) % update weights • cnntest.m % test the system using 10000 samples and show error rate Convolution Neural Network CNN ver. 4.11a 27 Architecture Layer 12: Layer 1: One input 6 conv.Maps (C) InputMaps=6 (I) OutputMaps=6 Fan_in=52=25 Fan_out=6x52= 150 Layer 1: Image Layer 2: Input 6x24x24 1x28x28 Kernel =5x5 Layer 23: 6 sub-sample Map (S) InputMaps=6 OutputMaps= 12 Layer 3: Layer 4: 6x12x12 12x8x8 Conv. I=input C=Conv.=convolution S=Subs=sub sampling Layer 34: 12 conv. Maps (C) InputMaps=6 OutputMaps =12 Fan_in= 6x52=150 Fan_out= 12x52=300 Subs 2x2 Layer 45: 12 sub-sample Map (S) InputMaps=12 OutputMaps=12 Each output neuron corresponds to a character (0,1,2,..,9 etc.) Layer 5: 12x4x4 Kernel =5x5 Conv. Subs 2x2 Convolution Neural Network CNN ver. 4.11a 10 28 • Cnnff.m convolution neural networks feed forward • This is the feed forward part • Assume all the weights are initialized or calculated, we show how to get the output from inputs. Convolution Neural Network CNN ver. 4.11a 29 • Layer 12: Layer 12: Layer 1: One input 6 conv.Maps (C) InputMaps=6 (I) OutputMaps=6 Fan_in=52=25 Fan_out=6x52= 150 Layer 1: Image Layer 2(c): Input (i) 6x24x24 Map_index= 1x28x28 1 i 2 Conv.*K(1) : 6 j Kernel =5x5 Conv.*K(6) I=input C=Conv.=convolution S=Subs=sub sampling 2x2 • Convolute layer 1 with different kernels (map_index1=1,2,.,6) and produce 6 output maps Inputs : • • input layer 1, a 28x28 image 6 different kernels : k(1),.,,,k(6) , each k is 5x5, K are dendrites of neurons • Output : 6 output maps each 24x24 • • • • • • • • Algorithm For(map_index=1:6) { layer_2(map_index)= I*k(map_index)valid } Discussion Valid means only consider overlapped areas, so if layer 1 is 28x28, kernel is 5x5 each, each output map is 24x24 In Matlab > use convn(I,k,’valid’) Example: I=rand(28,28) k=rand(5,5) size(convn(I,k,’valid’)) > ans > 24 24 • • • • • • • Convolution Neural Network CNN ver. 4.11a 30 Layer 23: • Sub-sample layer 2 to layer 3 • Inputs : • 6 maps of layer 2, each is 24x24 Layer 23: 6 sub-sample Map (S) InputMaps=6 OutputMaps= 12 Layer 2 (c): 6x24x24 • Output : 6 maps of layer 3, each is 12 x12 Layer 3 (s): 6x12x12 Subs 2x2 Map_index= 1 2 : 6 • • • • Algorithm For(map_index=1:6) { For each input map, calculate the average of 2x2 pixels and the result is saved in output maps. • Hence resolution is reduced from 24x24 to 12x12 • } • Discussion Convolution Neural Network CNN ver. 4.11a 31 • Layer 34: Layer 34: 12 conv. Maps (C) InputMaps=6 OutputMaps=12 Fan_in= 6x52=150 Fan_out= 12x52=300 Layer3 L3(s): 6x12x12 Index=i=1:6 • • • each is 8x8 • Layer 4(c): net.layers{l}.a{j} • 12x8x8 • Index=j=1:12 • : Kernel =5x5 Conv. layer 3 with kernels to produce layer 4 Inputs : • 6 maps of layer3(L3{i=1:6}), each is 12x12 • Kernel set: totally 6x12 kernels, each is 5x5,i.e. • K{i=1:6}{j=1:12}, each K{i}{j} is 5x5 • 12 bias{j=1:12} in this layer, each is a scalar Output : 12 maps of layer4(L4{j=1:12}), • • • • • • • • Algorithm for(j=1:12) {for (i=1:6) {clear z, i.e. z=0; z=z+covn (L3{i}, k{i}{j},’valid’)] %z is 8x8 } L4{j}=sigm(z+bais{j}) %L4{j} is 8x8 } function X = sigm(P) X = 1./(1+exp(-P)); End Discussion – Normalization? • 32 Convolution Neural Network CNN ver. 4.11a Layer 45 Layer 45: • Subsample layer 4 to layer 5 • Inputs : • 12 maps of layer4(L4{i=1:12}), each is 12x8x8 • Output : 12 maps of 12 sub-sample Map (S) InputMaps=12 OutputMaps=12 Layer 4: Layer 5: 12x8x8 12x4x4 layer5(L5{j=1:12}), each is 4x4 Subs 2x2 • Algorithm • Sub sample each 2x2 pixel window in L4 to a pixel in L5 • Discussion – Normalization? Convolution Neural Network CNN ver. 4.11a 10 33 • Layer 5output Layer 45: Totally 192 weights for each output neuron 12 sub-sample Map (S) InputMaps=12 OutputMaps=12 Layer 5 (L5{j=1:12}: 12x4x4=192 Totally 192 pixels • • • Each output neuron corresponds to a character (0,1,2,..,9 etc.) net.o{m=1:10} • : : Same for each output neuron Subsample layer 4 to layer 5 Inputs : • 12 maps of layer5(L5{i=1:12}), each is 4x4, so L5 has 192 pixels in total Output layer weights: Net.ffW{m=1:10}{p=1:192}, total number of weights is 192 Output : 10 output neurons (net.o{m=1:10}) • Algorithm • • • • • For m=1:10%each output neuron {clear net.fv net.fv=Net.ffW{m}{all 192 weight}.*L5(all corresponding 192 pixels) net.o{m}=sign(net.fv + bias) } • Discussion 10 Convolution Neural Network CNN ver. 4.11a 34 • Part 2B Back propagation part cnnbp( ) cnnapplyweight( ) Convolution Neural Network CNN ver. 4.11a 35 cnnbp( ) overview (output back to layer 5 E wi ( y t ) y (1 y ) xi in _ cnnbp.m out.o y net.e ( y t ) E ( y t ) y (1 y ) xi wi xi E 1 net.od net.e . * (net.o . * (1 - net.o)) xi wi E net.od * wi net.e . * (net.o . * (1 - net.o)) * wi xi so in code cnnbp.m E net.fvd (net.ffW' * net.od) xi Ref: See http://en.wikipedia.org/wiki/Backpropagation • Convolution Neural Network CNN ver. 4.11a 36 Layer 5 to 4 • Expand 1x1 to 2x2 Convolution Neural Network CNN ver. 4.11a 37 Layer 4 to 3 • Rotated convolution • Find dE/dx at layer 3 Convolution Neural Network CNN ver. 4.11a 38 Layer 3 to 2 • Expand 1x1 to 2x2 Convolution Neural Network CNN ver. 4.11a 39 Calculate gradient • • • • From later 2 to layer 3 From later 3 to layer 4 Net.ffW Net.ffb found Convolution Neural Network CNN ver. 4.11a 40 Details of calc gradients • • • • • • • • • % part % reshape feature vector deltas into output map style L4(c) run expand only L3(s) run conv (rot180, fill), found d L2(c) run expand only %Part %% calc gradients L2(c) run conv (valid), found dk and db L3(s) not run here L4(c) run conv(valid), found dk and db Done , found these for the output layer L5: – net.dffW = net.od * (net.fv)' / size(net.od, 2); – net.dffb = mean(net.od, 2); Convolution Neural Network CNN ver. 4.11a 41 cnnapplygrads(net, opts) • For the convolution layers, L2, L4 – From k and dk find new k (weights) – From b and db find new b (bias) • For the output layer L5 – net.ffW = net.ffW - opts.alpha * net.dffW; – net.ffb = net.ffb - opts.alpha * net.dffb; – opts.alpha is to adjust learning rate Convolution Neural Network CNN ver. 4.11a 42 appendix • Convolution Neural Network CNN ver. 4.11a 43 Architecture Layer 12: Layer 1: One input 6 conv.Maps (C) InputMaps=6 (I) OutputMaps=6 Fan_in=52=25 Fan_out=6x52= 150 Layer 1: Image Layer 2: Input 6x24x24 1x28x28 i Layer 23: 6 sub-sample Map (S) InputMaps=6 OutputMaps= 12 Layer 34: 12 conv. Maps (C) InputMaps=6 OutputMaps =12 Fan_in= 6x52=150 Fan_out= 12x52=300 Layer 3: Layer 4: 6x12x12 12x8x8 Layer 45: 12 sub-sample Map (S) InputMaps=12 OutputMaps=12 Each output neuron corresponds to a character (0,1,2,..,9 etc.) Layer 5: 12x4x4 u v Conv. Kernel =5x5 I=input C=Conv.=convolution S=Subs=sub sampling Subs 2x2 Kernel =5x5 Conv. Subs 2x2 Convolution Neural Network CNN ver. 4.11a 10 j 44 • A single neuron • The neural net has many layers • In between any neighboring 2 layers, a set of neurons can be found Each Neuron x l 1 (1) W l (1) x l f (u l ) with u l W l x l 1 bl x l 1 (2) W l (2) ul l l x f u x l 1 inputs at layer l 1 x x, W l weights, w W xl input at layer l Convolution Neural Network CNN ver. 4.11a 45 Derivation • dE/dW=changes at layer l+1 by changes in layer l l l 1 T l 1 l W f ' u • At output layer L • dE/db=d at output layer L • E=f(wx+b) L L n n f ' u y t • dE/db=d Convolution Neural Network CNN ver. 4.11a 46 References • Wiki – http://en.wikipedia.org/wiki/Convolutional_neura l_network – http://en.wikipedia.org/wiki/Backpropagation • Matlab programs – Neural Network for pattern recognition- Tutorial http://www.mathworks.com/matlabcentral/fileex change/19997-neural-network-for-patternrecognition-tutorial – CNN Matlab example http://www.mathworks.com/matlabcentral/fileex Convolution Neural Network CNN ver. 4.11a 47 change/38310-deep-learning-toolbox
© Copyright 2025