Proceedings of the IASTED International Conference on Signal and Image Processing October 18-21, 1999, Nassau, Bahamas Model-Based Training Set Synthesis for Vector Quantization Dorin Comaniciu Department of Electrical and Computer Engineering Rutgers University, Piscataway, NJ 08854-8058, USA [email protected] Abstract codebook optimization takes into account the particular statistics of the input. The organization of the paper is as follows. Section 2 denes the Vector Quantization with Training Set Synthesis (VQ-TSS) paradigm. The implementation of VQ-TSS in the transform domain is discussed in Section 3. Experimental results and comparisons are given in Section 4. We propose an adaptive vector quantization scheme based on the statistical modeling of AC cosine coefcients with mixtures of Gaussian distributions. The model parameters are used to synthesize training vector sets whose underlying distribution resembles that of transformed data. Since the model parameters are also sent to the decoding side, both the encoder and decoder can derive the same training set and codebook. Experiments with several test images showed that the codebooks obtained from synthesized data are eective for the vector quantization of transformed data, the entire procedure resulting in high quality image compression. 2 Vector Quantization with Training Set Synthesis The block diagram of a vector quantizer which uses training set synthesis, is shown in Figure 1. Keywords: Image Coding, Training Set Synthesis, Transform Vector Quantization, ExpectationMaximization. (a) 1 Introduction The nonstationary nature of image data often determines a signicant statistical dierence between the image being coded and the training set the codebook was designed for. Even for the case of large training sets of vectors, the input structure might not be reected in the current codebook, which determines important distortions. On the other side, an adaptive codebook typically requires large amounts of side information for the transmission of new codewords [1]. The synthesis of the training set [2] has been recently described as a method to indirectly specify an adaptive codebook. The idea is to t a statistical model to input data, estimate the model parameters, and use them to synthesize a training vector set which approximates the input. Thus, by sending the model parameters to the decoding side, the decoder can derive the adaptive codebook. Overall, the technique results in low bit rate encoding with good reconstruction quality. A simple model based on one-dimensional histograms was employed in [2]. In this paper we improve the training set synthesis by modeling the density of AC cosine coecients with mixtures of Gaussian distributions. We show that the codebooks obtained from synthesized data are eective for the vector quantization of transformed data. High quality image compression is obtained since the 296-161 (b) Figure 1: Vector quantization with training set synthesis. (a) Encoding side. (b) Decoding side. At the encoding side (Figure 1a) the input data is rst tted to a statistical model. The best-t parameters, named training set parameters (TSP), are used to synthesize a training set (TS) with statistics similar to the input. The codebook C , populated according to the generalized Lloyd algorithm (GLA) [3], is then employed to vector quantize the input data. Only the set of codeword indices I and the TSP are stored or transmitted. Figure 1b presents the decoding side. The received TSP are used to synthesize the TS which is further employed to derive the codebook. An approximate reconstruction of the original data is nally obtained based on the indices I and the codebook C . The VQ-TSS advantage is that very few side information, represented by the TSP, has to be transmitted. Thus, the complete codebook adaptation is accomplished with only a small increase in the bit rate. 1 3 Implementation 3, respectively) to limit the size of the corresponding codebook. Recall that the size of a codebook depends exponentially on the number of bits allocated for a certain vector, which is equal to the sum of all allocations received by the vector components. We describe and analyze below the DCT domain implementation of the proposed method, called transform VQ-TSS (TVQ-TSS). The modeling is less complex in the transform domain, where the coecients are (almost) decorrelated and have typically highly peaked histograms centered around zero. Note that the numerical and graphical examples in this section correspond to the 512 512 gray level image Lena. Table 1: The estimates after 100 EM iterations of the a priori probabilities, means, and square root of the variances (standard deviations) corresponding to the rst two AC coecients of the highest energy class of image Lena. (a) Coecient AC1 . (b) Coecient AC2 . 3.1 Transform Block Classication and Bit Allocation Scheme Let us consider the DCT of the B B blocks representing the input image. Following the usual notation we denote the coecient in the upper-left corner of a block as the DC coecient, while the remaining coefcients are named AC coecients. To increase the adaptation, we use the procedure described in [4] which classies the transformed blocks into nC equally populated classes according to the energy of AC coecients. The overhead information in bits/pixeldue to block classication is 2 nC > 1 RBC = (0b; log2 (nC ; 1)c + 1) =B ; ifotherwise (1) where bc is the down-rounded integer. A bit allocation scheme that gives real and positive bit rates is derived in [5] by supposing that each vector component is optimally encoded (in the distortion-rate sense). The overall distortion is minimized subject to the positive allocation restriction and an imposed bit rate RAC . The scheme assigns to the AC coecient in position (u; v), belonging to class c, and having the variance c2 (u; v), a number of bits equal to ( 1 m 2 3 4 1 m 2 3 4 2 1m 1m P2m 2m 2m 0.304 -71.649 146.186 0.218 -112.055 60.971 0.133 123.358 31.939 0.343 104.668 163.452 (a) 0.196 14.206 164.537 0.145 -22.670 33.696 0.088 5.819 14.701 0.569 3.116 90.617 (b) 3.3 Modeling the AC Coecients Most of the methods in literature assume that the AC coecients are statistically independent and model them with Laplacian distribution [6], Gaussian or Gamma [7], Generalized Gaussian [8], or Mixture of Gaussian Distributions (MGD) [9]. We further employ the MGD model, which captures the input statistics better than models relying on one elementary distribution. According to MGD, if x = (x1 ; : : : ; xk )> is a vector of AC coecients resulting from the above vector formation, the PDF of its j th component is given by ) ) 2 log2 c (u;v ; if 0 < c (u; v ) Rc (u; v) = 0 otherwise 2 = max 0; 21 log2 c (u; v) (2) where is the solution of 1 X log c2 (u; v) = R : (3) AC 2 2 (u;v) 2 1 2 P1m fj (x) = Mj X m=1 Pjm gim (x); j = 1; : : : ; k; (4) where Mj is the number of Gaussians employed in modeling and gjm is the Gaussian distribution having a pri2 ori probability Pjm , mean jm , and variance jm , with PMj m=1 Pjm = 1. maximum-likelihood estimates of The 2 Pjm ; jm ; jm with m = 1; : : : ; Mj , are part of the training set parameters and are obtained by dierentiating the logarithm of likelihood function. The iterative procedure that solves the likelihood equations is the expectation maximization (EM) algorithm [10]. The derivation of the best value for Mj (in the maximum likelihood sense) requires multiple runs of the EM algorithm, which induces additional complexity. Therefore, we have o-line selected Mj as being equal to the number of Gaussians that maximize the c We compute the bit allocation for each vector of DCT coecients by summing the values derived in equation (2) and rounding the result. 3.2 Vector Formation Each energy class is treated separately after the allocation of bits. The DCT block is decomposed into the DC coecient and 17 vectors taken in zigzag order and denoted by v1 = (AC1 ; AC2 )> , v2 = (AC3 ; AC4 ; AC5 )> ,. . . , v17 = (AC61 ; AC62 ; AC63 )> . The maximum vector dimension is kmax = 4, due to efciency constraints resulting from the GLA algorithm (we explain this limitation in Section 3.4). In addition, the rst two vectors have lower dimension (2 and 2 compression performance. For most of the images we tested, Mj = M = 4 proved to be a good solution. The estimated parameters corresponding to the rst and second AC coecients of the highest energy class of image Lena are shown in Table 1a and Table 1b, respectively. Figure 2 shows the PDFs of the same coecients derived with equation (4). In the gure, we compare the MGD result with estimates obtained through nonparametric analysis with the optimal Epanechnikov kernel [11]. The two curves are very close to each other. In addition, the PDF of the rst coecient is bimodal and asymmetric, which justies the modeling based on a mixture of distributions. The joint PDF corresponding to the vector v1 = (AC1 ; AC2 )> of the highest energy class is presented in Figure 3a. For comparison, Figure 3b shows the 2-dimensional Epanechnikov density estimate of the same data. The two surfaces have the same global features, each exhibiting two signicant modes. −5 x 10 Normalized density 1.5 1 0.5 −3 3 x 10 MGD Epanechnikov 0 −400 Normalized density 2.5 −200 500 0 200 2 AC 2 1.5 0 400 −500 AC 1 (a) −5 x 10 1 1.5 0 −600 −400 −200 0 200 Coefficient value 400 600 Normalized density 0.5 800 (a) −3 7 x 10 1 0.5 MGD Epanechnikov 6 0 −400 Normalized density 5 −200 500 0 200 4 AC 2 2 1 0 Coefficient value 500 (b) Figure 2: The PDFs corresponding to parameters from Table 1 and the PDFs derived through nonparametric analysis with optimal kernel of window width h = 30. (a) Coecient AC1 . (b) Coecient AC2 . 3.4 Training Set Synthesis and Codebook Generation The joint PDF of a vector x = (x1 ; : : : ; xk )> whose components are assumed to be statistically independent is equal to the product of marginal densities. With the marginal densities modeled according to (4), the joint PDF of x is given by f (x) = k Y j =1 fj (x) = k X M Y j =1 m=1 Pjm gjm (x): 400 −500 AC 1 (b) Figure 3: The joint PDF of the vector v1 = (AC1 ; AC2 )> of the highest energy class derived from image Lena. (a) MGD model. (b) 2-dimensional Epanechnikov estimate. To generate a training set whose underlying distribution is approximated by (5), we uniformly sample the space covered by x using a k-dimensional cubic lattice fxq gq=1:L with minimum point separation . Then, we associate to each lattice point xq the weight f (xq ). If [xj;min ; xj;max ] is the range of values for the j th component of x, then the number of samples for dimension j is lj = b(xj;max ; xj;min )=c. The number of lattice points L is the product of the number of samples for each dimension 3 0 −500 0 L= k Y j =1 b(xj;max ; xj;min )=c: (6) To reduce the error caused by sampling, the value of should be small. However, equation (6) shows that the number of lattice points is inversely proportional to (5) 3 the kth power of . Since the number of lattice points L and the vector dimensionality k determine the speed of the GLA algorithm for codebook generation, we limited their values to Lmax = 50; 000 and kmax = 4, which induced an overall compression/decompression time of only a few seconds. The lattice points and their weights constitute the training set used as input to the GLA algorithm. An efcient prediction-based implementation of GLA can be achieved by taking into account that the lattice points form an ordered and uniformly spaced set. Thus, there is a high probability that two lattice points with successive indices are allocated to the same codeword. The search for the closest codeword to the current lattice point can therefore be performed in a small neighborhood of the codeword associated with the previous lattice point. (a) (b) Figure 4: Block diagram of TVQ-TSS compression. (a) Encoder. (b) Decoder. 3.5 Encoder and Decoder Schemes 4 Experimental Results Figure 4a presents the block diagram of the TVQTSS encoder which assumes the following operations: The input image is partitioned into 8 8 blocks and 2-dimensional DCT is computed for each block. The DC coecient is uniformly quantized and the resulting values are DPCM encoded and transmitted. The transformed blocks are classied into 4 equally populated classes and the bits are allocated according to Section 3.1. The bit allocation map and the class indices are transmitted to the decoder. The TSP are estimated and the codebooks derived as described in Section 3.3 and 3.4. The vector quantization of the transform data yields the set of codeword indices I which are transmitted together with the TSP. Finally, error analysis and reduction is performed. The largest E errors are considered and their positions inside the DCT blocks are coded and transmitted. The error reduction is achieved using 2 correcting values (one positive and one negative) for all errors. Each DCT block has a one-bit ag that shows whether inside the block corrections are operated or not. Two additional bits are required for each correction. One indicates the sign of correction. The other shows whether the next correction belongs to the same block as the current correction. Figure 4b presents the block diagram of the TVQTSS decoder. The processing starts with training set synthesis based on the received TSP, followed by codebook derivation and decoding of the codeword indices I. The error information is then used to reduce a selected set of errors. The inverse transformation of the corrected data produces an approximated replica of the original image. We tested the new compression method on a Sun Ultra 60 Workstation (C implementation). The images used for testing are available via anonymous ftp to whitechapel.media.mit.edu under /pub/testimages. They are all 512 512 pixel monochrome still images with 256 gray levels. A rst set of results is presented in Table 2 containing the peak signal to noise ratios (PSNRs) of the images in the test set after compression/decompression at 0:28 bits/pixel. The coding time for one image was less then 5 seconds while the decoding took about 3 seconds. Figure 5 shows the encoded image Lena at 0:25 and 0:5 bits/pixel, respectively. Table 2: Coding performance for TVQ-TSS at a bit rate of 0:28 bits/pixel. Image PSNR (dB) Image PSNR (dB) Al 32.87 Goldhill 30.08 Aero 29.65 Jet 30.85 Baboon 23.13 Lena 32.51 Bank 28.02 Loco 25.75 Barbara 26.59 London 32.25 Boat 30.11 Oleh 32.93 Couple 38.88 Pyramid 31.98 Einstein 34.14 Regan 32.05 Face 31.54 Wedding 30.57 Girl 33.94 Zelda 35.95 The PSNR-based comparisons presented in Figure 6 show that TVQ-TSS performance is better that that of JPEG standard. The improvement in PSNR is almost 1 dB for the Lena image. One can also observe that (for the same image) our method performs better than other three recent techniques which employ vector quantization of the DCT coecients (classied VQ in the transform domain [12], VQ with variable block-size [13], and additive vector decoding of transform coded 4 35.5 35 34.5 PSNR (dB) 34 33.5 33 32.5 TVQ−TSS JPEG Ref [29] Ref [30] Ref [50] 32 31.5 31 30.5 0.2 0.25 0.3 0.35 0.4 Bit rate (bits/pixel) 0.45 0.5 0.55 Figure 6: Coding performance for image Lena. References [1] M. Lightstone and S.K. Mitra, \Image-adaptive vector quantization in an entropy-constrained framework", IEEE Trans. Image Process., Vol. 6, 1997, 441-450. [2] D. Comaniciu, \Training Set Synthesis for EntropyConstrained Transform Vector Quantization", Proc. IEEE ICASSP, Atlanta, Vol. 4, 2036-2039, 1996. [3] Y. Linde, A. Buzo, and R.M. Gray, \An algorithm for vector quantizer design", IEEE Trans. Commun., Vol. COM-28, 1980, 84-95. [4] W.H. Chen and H. Smith, \Adaptive coding of monochrome and color images", IEEE Trans. on Commun., Vol. COM-25, 1977, 1285-1292. [5] A. Segall, \Bit allocation and encoding for vector sources", IEEE Trans. Inform. Theory, Vol. IT-22, 1976, 162-169. [6] R.C. Reininger and J. Gibson, \Distribution of the two-dimensional DCT coecients for images", IEEE Trans. Commun., Vol. COM-31, 1983, 835-839. [7] A.N. Netravali and B.G. Haskell, Digital Pictures, Representation and Compression, Plenum Press, New York, 1989. [8] K.A. Birney and T.R. Fisher, \On the modeling of DCT and subband image data for compression", IEEE Trans. Image Process., Vol. 4, 1995, pp. 186-193. [9] D. Comaniciu, R. Grisel, and F. Astrade, \Medical image compression using mixture distributions and optimal quantizers", Proc. IASTED ICSIP, Las Vegas, 1995, 89-92. [10] R.A. Redner and H.F. Walker, \Mixture densities, maximum likelihood and the EM algorithm", SIAM Review, Vol. 26, 1984, 195-239. [11] D. Comaniciu and P. Meer, \Distribution free decomposition of multivariate data", Pattern Analysis and Applications, Vol. 2, No. 1, 1999, 22-30. [12] J.W. Kim and S.U. Lee, \A transform domain classier vector quantizer for image coding", IEEE Trans. Circuits Syst. Video Technol., No. 2, 1992, pp. 3-14. [13] M.H. Lee and G. Crebbin, \Classied vector quantization with variable block-size DCT models", IEE Proc. Vis. Image Sig. Process., Vol. 141, 1994, pp. 39-48. [14] S.W. Wu and A. Gersho, \Additive vector decoding of transform coded images", IEEE Trans. Image Processing, Vol. 7, 1998, pp. 794-803. [15] Z.M. Yusof and T.R. Fisher, \An entropy-coded lattice vector quantizer for transform and subband image coding", IEEE Trans. Image Processing, Vol. 5, 1996, 289-298. (a) (b) Figure 5: TVQ-TSS results: image Lena. (a) 0:25 bits/pixel, 31:84 dB. (b) 0:5 bits/pixel, 34:92 dB. images [14]). The computational complexity is higher for TVQ-TSS, however, as we mentioned before, the processing time is only a few seconds on a standard workstation. Is is worth noting that TVQ-TSS (which is based on a xed-rate allocation scheme) performs very close to techniques based on variable rate VQ (see for example the entropy-coded lattice vector quantizer reported in [15]). 5 Conclusions This paper introduced a new method to achieve codebook adaptation to the input statistics with a small amount of side information. We presented a transform domain implementation of Vector Quantization with Training Set Synthesis and showed that its performance is competitive with other compression schemes based on VQ and transform coding. 5
© Copyright 2025