COMPUTATIONALLY EFFICIENT INVARIANT PATTERN RECOGNITION WITH HIGHER ORDER PI-SIGMA NETWORKS1 Yoan Shin and Joydeep Ghosh Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 ABSTRACT A class of higher-order networks called Pi-Sigma networks has recently been introduced for function approximation and classication [4]. These networks combine the fast training abilities of single-layered feedforward networks with the non-linear mapping of higher-order networks, while using much fewer number of units. In this paper, we investigate the applicability of these networks for shift, scale and rotation invariant pattern recognition. Results obtained using a database of English and Persian characters compare favorably with other neural network based approaches [2, 3]. 1. Introduction Feedforward networks based on single layer of linear threshold logic units (TLUs) can exhibit fast learning, but have limited capabilities. For instance, the ADALINE and the simple perceptron can only realize linearly separable dichotomies [1]. The addition of a layer of hidden units dramatically increases the power of layered feedforward networks. Indeed, networks with a single hidden layer, such as the multilayer perceptron (MLP), and using arbitrary squashing functions, are capable of approximating any Borel measurable function from one nite dimensional space to another to any desired degree of accuracy, provided suciently many hidden units are available. However, the training speeds for MLP are typically much slower than those for feedforward networks comprising of a single layer of TLUs due to multilayering that necessitates backpropagation of error. In an orthogonal direction, higher-order correlations among input components can be used to construct a higher-order network to perform nonlinear mappings using only a single layer of units [2]. The basic building block of such networks is the higher-order processing unit (HPU), a neural-like element whose output y is given by: y = (w0 + Xw x + Xw j j j jk j;k xj xk + Xw jkl xj xk xl + ); (1) j;k;l where () is a suitable nonlinear activation function such as the hyperbolic tangent, x is the j -th component of input vector x, w is an adjustable weight from product of input components x ; x ; x ; to the output unit, and w0 is the threshold. Higher-order correlations enable HPUs to learn geometrically invariant properties more easily [2]. Unfortunately, the j jkl j k l This research was supported by DARPA/ONR contract N00014-89-C-0298 with Dr.Barbara Yoon (DARPA) and Dr.Thomas McKenna (ONR) as government cognizants. 1 1 number of weights required to accommodate all higher-order correlations increases exponentially with the input dimension, N . Consequently, typically only second order networks are considered in practice. A notable exception is when some a priori information is available about the function to be realized. Such information has also been used with some success to remove \irrelevant" terms [2] Such a restriction to the order of the network leads to a reduction in the mapping capability, thereby limiting the use of this kind of higher-order networks. The Pi-sigma network introduced in the next section attempts to combine the best of single-layered networks (quick learning) and multi-layered networks (more capabilities with small weight set). 2. Pi-Sigma Networks Figure 1 shows a Pi-sigma Network (PSN) with a single output. This network is a fully connected two-layered feedforward network. However, the summing layer is not \hidden" as in the case of the multilayered perceptron (MLP), since weights from this layer to the outputs are xed at 1. This property drastically reduces training time. Let x = (1; x1; : : : ; x ) be an N + 1-dimensional augmented input column vector where x denotes the k -th component of x. The inputs are weighted by K weight vectors w = (w0 ; w1 ; : : : ; w ) ; j = 1; 2; ; K and summed by a layer of K linear \summing" units, where K is the desired order of the network. The output of the j -th summing unit, h , is given by: N T k j j j Nj T j hj =w x= T j Xw N k kj =1 xk + w0j ; j The output y is given by: y = ( = 1; 2; : : : ; K: (2) Yh ) K j =1 (3) j where () is a suitable nonlinear activation function. In the above, w is an adjustable weight from input x to j -th summing unit and w0 is the threshold of the j -th summing unit. The weights can take arbitrary real values. If a specic input, say x is considered, then h s, y, and net are also superscripted by p. In this paper, we consider (x) = 1+1?x ; which corresponds to the Analog Pi-Sigma Model [4]. The network shown in Figure 1 is called a K -th order PSN since K summing units are incorporated. The total number of adjustable weight connections for a K -th order PSN with N dimensional inputs is (N + 1) K . If multiple outputs are required, an independent summingPlayer is needed for each output. Thus, for an M -dimensional output vector y, a total of =1(N + 1) K adjustable weight connections are needed, where K is the number of summing units for the i-th output. This allows us great exibility since all outputs do not have to retain the same complexity. Note that using product units in the output layer indirectly incorporates the capabilities of higher-order networks with a smaller number of weights and processing units. This also enables the network to be regular and incrementally kj k j p j e M i i i 2 expandable, since the order can be increased by one by adding another summing unit and associated weights, but without disturbing any connection established previously. The learning rule is based on gradient descent on the estimated mean squared error surface in weight space, yielding: w = (t ? y ) (y )0 ( p p p l Yh )x ; p = j6 (4) p j l where (y )0 is the rst derivative of sigmoidal function (), that is, (y )0 = 0 () = (1 ? ()) (), x is the (augmented) p-th input pattern and is the learning rate. At each update step, all K sets of weights are updated but in an asynchronous manner. That is, one set of weights w = (w0 ; w1 ; : : : ; w ) (corresponding to the j -th summing unit) is chosen at a time and modied according to the weight update rule. Then, for the same input pattern, the output is recomputed for the modied network, and the error is used to update a dierent set of weights. For every input, this procedure is performed K times so that all K sets of weights are updated once. It can be shown that that procedure is more stable than the usual scheme where all weights are simultaneously updated. A detailed convergence analysis of the Pi-sigma learning rule is given in [4]. p p p j j j Nj T 3. Invariant Pattern Recognition Practical techniques for recognition of geometric patterns must incorporate some degree of tolerance to noise in the input, and to variations brought about by (small) translation/rotation/scaling of the patterns with respect to the prototypes. One common approach is to preprocess the input to convert it into another format that is more robust to these changes. This includes extraction of rotation invariant features derived from complex and orthogonal Zernike moments of the image [3]. An alternative in the context of neural networks is to handcraft weights of units such that their response shows little sensitivity to the class of transforms for which invariance is desired. The latter approach has been taken in [2], where apriori information is used to reduce the complexity of HPU networks. Often, such apriori knowledge is not available, or the preprocessing is too computationally expensive. The Pi-sigma network can thus be brought to bear fruitfully, since it incorporates higher-order correlation and is yet computationally ecient. To test this hypothesis, we have constructed a database of English and Persian characters. For each character, there are binary templates for a \standard" exemplar, noisy versions in which a fraction 0 x 0:4 of the bits are corrupted, and scaled/rotated variants of these versions. Sample templates are depicted in Fig.2 (a), showing noisy versions of 'C', and Fig.2 (b) that shows a Persian character and some noisy versions. Half of the templates are chosen as the training set, and the rest are chosen for testing the classication and generalization properties of the network. A parallel series of experiments use extensive cross-validation (jack-knife resampling) to study the eect of training set size on quality of results. Each series consist of two sets of experiments. In the rst, each (rst order) feature vector is augmented by Zernike moments of up to order 5 (14 moments), and fed into 2nd 3 and 3rd order PSNs. In the second set, only the feature vectors are used as inputs, and the order of PSN is progressively increased by adding extra summing units. This also serves to test for scaling and generalization. Preliminary results for both function approximation and classication are extremely encouraging, and show a speedup of about two orders of magnitude over backpropagation for achieving similar quality of solution. We are currently completing the experiments outlined above, and also making comparisons with HPU networks. Concluding Remarks: In this paper, we investigate the nonlinear mapping capabilities of PSNs, with emphasis on the shift and rotation-invariant pattern recognition. Due to the ability to form higher-order correlations, we do not need to pre-compute all higher order moments and then feed them into the network, as was done in [3]. Rather, the network provides a range of congurations with a trade-o between pre-computation and order of the network. The structure of PSNs is highly regular in the sense that summing units can be added incrementally till an appropriate order of the network is attained without overtting of the function. This is useful for the invariant pattern recognition problem, since the order can be gradually increased till the desired level of noise tolerance and invariance capabilities is reached. Our preliminary results on English and Persian alphabets support these observations. References [1] B. Widrow and M. Lehr,\30 Years of Adaptive Neural Networks: Perceptron, Madaline, and Backpropagation," Proc. IEEE , Vol.78, No.9, pp.1415-1442, Sep. 1990. [2] C. L. Giles and T. Maxwell,\Learning, Invariance, and Generalization in a High-Order Neural Network,"Applied Optics , Vol.26, No.23, pp.4972-4978, 1987. [3] A. Khotanzad and J.H. Lu, \Classication of Invariant Image Representations using a Neural Network", IEEE Trans. on ASSP, Vol. 38, No. 6, pp. 1028-1039, June 1990. [4] Y. Shin and J. Ghosh,\Ecient Higher-order Neural Networks for Function Approximation and Classication", IEEE Trans. Neural Networks, In review. [5] Y. Shin and J. Ghosh,\The Pi-sigma Network: An Ecient Higher-order Neural Network for Pattern Classication and Function Approximation ,"Proceedings of International Joint Conference on Neural Networks , Vol.I, pp.13-18, Seattle, July 1991. [6] R. O. Duda and P. E. Hart, Pattern Classication and Scene Analysis , John Wiley & Sons, 1973. 4 . Figure 2: Examples of character templates and OCR environment. (a) Noisy versions of \C". (b) A Persian character and its noisy variants 5
© Copyright 2024