2012 International Conference on Electrical and Computer Engineering Advances in Biomedical Engineering, Vol.11 A Kernel-Based and Sample-Weighted Fuzzy Clustering Algorithm Shixiong Xia, Qiang Liu, Yong Zhou, Bing Liu School of Computer Science and Technology China University of Mining and Technology Xuzhou, Jiangsu, China [email protected] Keywords: Clustering analysis, Kernel function, Sample weighting Abstract. Among the clustering algorithms based on objective functions, Fuzzy c-means algorithm is the most perfect and widely used. It is an important way of data analysis, but the algorithm is greatly influenced by the outliers and it is not good for nonlinear data. To overcome these shortcomings, a new clustering algorithm is proposed which is based on sample weighting and applying the kernel function. It can implicitly perform the data mapping into a high dimensional feature space. In this way, the data is more clearly separable, and the outliers can be filtered greatly. Through the simulation, the proposed method is characterized by higher clustering accuracy than the previous algorithms. 1. Introduction Clustering analysis is a kind of multivariate statistical analysis and a branch of unsupervised pattern recognition [1]. It divides samples without class labels according to certain standards into several clusters. It makes similar samples into the same class, samples which are not similar are divided into different classes as far as possible. As an unsupervised classification method, clustering analysis has been widely used in pattern recognition, data mining, and computer vision and fuzzy control, and many other areas. The traditional clustering is a hard division, it puts each on-identify object strictly divided into some, with "the nature of black-and-white", however, in fact most objects have not strictly properties, so people put forward soft partition, using the fuzzy clustering method to deal with the problem, which is called FCM . FCM is a very popular fuzzy clustering algorithm at present, it was proposed by Dunn, then was expanded by Bezdek[2]. In view of efficiency and the widespread application of FCM, people research and improve it. The existing problems mainly are: 1, It must make sure the cluster number. 2, algorithm performance depends on the choice the initial clustering centers and initial membership degree matrix. 3, outliers interference. Aiming at these problems, many scholars studied a lot. [3] discussed the parameter m and the cluster number. [4] proposed a weighted fuzzy clustering algorithm, which considered the dimension characteristic for classification is different. In this paper, we introduce the sample weighting and apply the kernel function to filter the outliers. Through the experiment, the proposed method is characterized by higher clustering accuracy than the previous algorithms. 2. Fuzzy Clustering Algorithms FCM is a clustering algorithm based on objective functions, which adds memberships for each data 978-1-61275-029-3 /10/$25.00 ©2012 IERI 227 ICECE 2012 object to represent the degrees of the data object belonging to each class degree. We denote the membership by µij ,the prototypes cluster i by vi , the distance between x j and vi by d ij . By the normalized regulation, the total of a data sample membership equals 1. Let X = {x1 , x2 ⋯ xn } be in M-dimensional space. Let c be the number of clusters. The FCM algorithm partitions X into c fuzzy subsets by minimizing the following objective function: c n J FCM (U ,V ) = ∑∑ µijm d ij2 (1) i =1 j =1 c n µij ∈ [0,1], ∀i, j; ∑ µij = 1, ∀j;0 < ∑ µij < n, ∀i i =1 j =1 Where m is the fuzziness index, m = 2 [5], and d = (x j − vi ) (x j − vi ) . Minimization of J FCM is T 2 ij performed by a fixed-point iteration scheme known as the Alternating Optimization technique. The minimization of the objective function gives the following Eqs: 1 µik = (2) 2 c  d ik  m−1   ∑ j =1   d jk  n ∑µ vi = m ik k x k =1 n ∑µ (3) m ik k =1 3. Kernel Based Hybrid C-Means Clustering A kernel function is a generalization of the distance metric that measures the distance two data points as the data points are mapped into a high dimensional space in which they are more clearly separable. Suppose sample xk ∈ R N ,by mapping function, it can be mapped into a higher dimensional space φ ( x1 ), φ (x2 )⋯φ ( xn ) ,the dot product in the high dimensional feature space can be calculated through the kernel function K (xi , x j ) in the input space R N : K (xi , x j ) = (φ ( xi ) • φ (x j )) (4) Some people proposed PFCM algorithm. The PFCM algorithm based on kernel function can be called KPFCM. KPFCM minimizes the following objective function: ( ) J KPFCM = ∑∑ aµijm + btijη Dij2 + ∑γ i ∑ (1 − tij ) c n i=1 j =1 c n i =1 j =1 η (5) c 0 ≤ µij , ∑ µij = 1, ∀j; tij < 1, a > 0, b > 0, m > 1,η > 1. i =1 Where γ i are suitable positive numbers. Dij2 = φ ( x j ) − φ (vi ) 2 = φ ( x j )T φ ( x j ) + φ (vi )T φ (vi ) − 2φ ( x j )T φ (vi ) = K ( x j , x j ) + K (vi , vi ) − 2 K ( x j , vi ) If we adopt Gaussian function as a kernel 228 function, ( K ( x, y) = exp − β x − y 2 ) ,then Dij = 2(1 − K (x j , vi )) . The minimization of the objective function gives the following Eqs: (1 / (1 − K (xk , vi )))1 / m−1 µik = c ∑ (1 / (1 − K (xk , v j )))1 / m−1 (6) j =1 tik = 1  2b(1 − K ( xk , vi ))  1+   γi   ∑ (aµ n vi = m ik ) + btikη K ( xk , vi )xk k =1 n ∑ (aµ (7) 1 / η −1 (8) ) + btikη K (xk , vi ) m ik k =1 4. Kernel Clustering Algorithm Based On Sample Weighting 4.1 Sample Weighting Definition: data point A = ϕ A ( x1 , x2 ⋯ xm ) ,where ϕ A is the weighting of data point A . And the calculation formula of the weight of each data can be defined as follows[6]: ∑ exp(− α x n ϕj = − xk j j ,k =1 2 ) (9) Where α is a positive number. From (9), we can know that the weight of a sample is related to the total distance between the data point and all data points. However, outliers are farther than common data points, so they will get a smaller weight. Thus it is good for clustering and filtering outliers. 4.2 Kernel Clustering Algorithm Based On Sample Weighting Combining sample weighting and kernel function, we proposed a new clustering algorithm which is called IKPFCM. The objective function of IKPFCM can be defined as the following: ( ) J IKPFCM = 2∑∑ aµijm + btijη (1 − K (x j , vi )) c n i =1 j =1 + ∑ γ i ∑ (ϕ j − tij ) c n i =1 j =1 η (10) c 0 ≤ µij , ∑ µij = 1, ∀j; tij < 1, a > 0, b > 0, m > 1,η > 1. i =1 Minimization of J IKPFCM is performed by a fixed-point iteration scheme known as the Alternating Optimization technique. The minimization of the objective function gives the following Eqs: (1 / (1 − K (xk , vi )))1 / m−1 µik = c (11) 1 / m−1 ( ( ( ) ) ) 1 / 1 − K x , v ∑ k j j =1 tik = ϕk  2b(1 − K ( xk , vi ))  1+   γi   229 1 / η −1 (12) ∑ (aµ n vi = m ik ) + btikη K ( xk , vi )xk k =1 n ∑ (aµ (13) ) + btikη K (xk , vi ) m ik k =1 It is suggested to select γ i as n γi = 2∑ µikm (1 − K ( xk , vi )) k =1 (14) n ∑ µikm k =1 The full description of IKPFCM algorithm is as follows: Step1: fix c, m, α , β , a, b,η , ε , and set biggest cycle number rmax . Step2:determine initial prototypes vi with FCM algorithm. Step3: calculate kernel matrix. Step4: for k = 1,2,⋯ rmax , do : (a) Update all membership values µik with Eqs(11). (b) Update all typically values tik with Eqs(12). (c) Update all prototypes vi with Eqs(13). (d) Compute γ i using Eqs(14). (e) Compute E = max i ,k µikk − µikk −1 ,if E ≤ ε , stop, else k = k + 1 . 5. Simulation and Results In order to test the effectiveness of the IKPFCM algorithm, we use FCM, PFCM, IKPFCM to test two Data sets separately. Through experiment results for comparison and analysis, we conclude that IKPFCM algorithm is better than other algorithms. 5.1 Test on X12 Data set X12 [7] has 12 data points, by doing experiments on this data set , we can get the following results. Table1 shows clustering center produced by FCM, PFCM, IKPFCM. Fig.1 shows the clustering − 3.34 0 results for FCM, PFCM, IKPFCM. The true centers are vtrue =  .  3.34 0 Table1 Clustering centers produced by FCM, PFCM, IKPFCM. FCM (m=2) PFCM (a=1,b=1,m=2,η=2) IKPFCM (a=1,b=1,m=2,η=2,α=1.0/0.4,β=2) -2.98 0.54 -2.83 0.36 -3.34 0.00 2.98 2.83 0.36 3.34 0.00 0.54 230 Fig.1 ‘+’shows prototypes by FCM,’口’shows prototypes by PFCM, red ’o’ shows prototypes by IKPFCM In order to reflect the accuracy of each clustering algorithm, we compute the error 2 E* = vtrue − v* ，where * is FCM / PFCM / KPFCM. Through calculation, we can easily get the following results, E FCM = 0.4212 , E PFCM = 0.3897 and E IKPFCM = 0 . Clearly, from Fig.1, we can see that the proposed method is superior to other previous methods. 5.2 Test on IRIS We do test on IRIS[8] by FCM, PFCM, IKPFCM. Experimental conditions can be fixed as following: ε = 0.0001 , rmax = 100 , a = 1 , b = 1 , m = 2 , η = 2 , α = 1 / 0.4 , β = 1 / 200 . Do experiments on 150 data points of IRIS, we can get the following results. Table 2 Without adding noise, do test by FCM, PFCM, IKPFCM Algorithms Wrong scores Iteration Time(s) Center offsets FCM 16 23 0.015 0.075 PFCM 15 26 0.031 0.051 IKPFCM 13 24 0.094 0.024 Table 3 Add 50 noises, do test by FCM, PFCM, IKPFCM Algorithms FCM PFCM IKPFCM Wrong scores 50 50 11 Time(s) 0.031 0.047 0.125 Table 4 Add 100 noises, do test by FCM, PFCM, IKPFCM Algorithms FCM PFCM IKPFCM Wrong scores 50 50 15 Time(s) 0.046 0.094 0.203 From Table 2 ,we can see that the IKPFCM algorithm has the least wrong scores, although it takes a little more time. From Table 3 and 4, we can see that in noisy conditions IKPFCM algorithm is much better than other algorithms. So IKPFCM algorithm can filter noise well. 6. Conclusions In this paper, we have proposed a new algorithm which is based on sample weighting and applying 231 kernel function at the same time. It can filter noises and increase robustness well. Through doing experiments on two data sets, the results show that IKPFCM algorithm is really characterized by higher clustering accuracy than the previous algorithms. 7. Acknowledgments This work is partially supported by National Science Foundation for Post-doctoral Scientists of China (Grant No. 20070421041), Jiangsu Postdoctoral Sustentation Fund of China (Grant No. 0701045B) and Scientific Research Foundation of China University of Mining and Technology (Grant No. 2007B017). References [1] Gao Xinbo, Xie Weixin. A research on fuzzy clustering theory and its applications [J]. Chinese Science Bulletin, 1999, 44(21): 2241-2248. [2] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum, NY, 1981. [3] Wu Xiaohong, Zhou Jianjiang. Allied fuzzy c-means clustering model. Transaction of Nanjing University of Aeronautics and Astronautics, 23(3), pp.208-213, 2006. [4] Pal N R, Bezdek J C. On cluster validity for the fuzzy c2means model[ J ]. IEEE Tran. Fuzzy Systems, 1995, 3 (3) : 370 - 379. [5] Raghu Krishnapuram, James M. Keller. A Possibilistic Approach to Clustering[ J ]. IEEE Transactions on Fuzzy Systems, 1993, 1 ( 2) :982110. [6] Nikhil R. Pal, Kuhu Pal, James M. Keller, and James C. Bezdek. A Possibilistic Fuzzy c-Means Clustering Algorithm. IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 13, NO. 4, AUGUST 2005. [7] Pal N R, Pal K, Bezdek J C, et al. A Possibilistic Fuzzy c-Means Clustering Algorithm [J]. IEEE Trans Fuzzy Systems, 2005, 13(4): 517-530. [8] Bezdek J C, Keller J M, Krishnapuram R, et al. Will the Real Iris data please stand up? [J]. IEEE Trans Fuzzy Systems, 1999, 7(3): 368-369. 232