A Kernel-Based and Sample-Weighted Fuzzy Clustering Algorithm

2012 International Conference on Electrical and Computer Engineering
Advances in Biomedical Engineering, Vol.11
A Kernel-Based and Sample-Weighted Fuzzy Clustering Algorithm
Shixiong Xia, Qiang Liu, Yong Zhou, Bing Liu
School of Computer Science and Technology
China University of Mining and Technology
Xuzhou, Jiangsu, China
[email protected]
Keywords: Clustering analysis, Kernel function, Sample weighting
Abstract. Among the clustering algorithms based on objective functions, Fuzzy c-means algorithm
is the most perfect and widely used. It is an important way of data analysis, but the algorithm is
greatly influenced by the outliers and it is not good for nonlinear data. To overcome these
shortcomings, a new clustering algorithm is proposed which is based on sample weighting and
applying the kernel function. It can implicitly perform the data mapping into a high dimensional
feature space. In this way, the data is more clearly separable, and the outliers can be filtered greatly.
Through the simulation, the proposed method is characterized by higher clustering accuracy than the
previous algorithms.
1. Introduction
Clustering analysis is a kind of multivariate statistical analysis and a branch of unsupervised pattern
recognition [1]. It divides samples without class labels according to certain standards into several
clusters. It makes similar samples into the same class, samples which are not similar are divided into
different classes as far as possible. As an unsupervised classification method, clustering analysis has
been widely used in pattern recognition, data mining, and computer vision and fuzzy control, and
many other areas. The traditional clustering is a hard division, it puts each on-identify object strictly
divided into some, with "the nature of black-and-white", however, in fact most objects have not
strictly properties, so people put forward soft partition, using the fuzzy clustering method to deal
with the problem, which is called FCM .
FCM is a very popular fuzzy clustering algorithm at present, it was proposed by Dunn, then was
expanded by Bezdek[2]. In view of efficiency and the widespread application of FCM, people
research and improve it. The existing problems mainly are: 1, It must make sure the cluster number.
2, algorithm performance depends on the choice the initial clustering centers and initial membership
degree matrix. 3, outliers interference. Aiming at these problems, many scholars studied a lot. [3]
discussed the parameter m and the cluster number. [4] proposed a weighted fuzzy clustering
algorithm, which considered the dimension characteristic for classification is different. In this paper,
we introduce the sample weighting and apply the kernel function to filter the outliers. Through the
experiment, the proposed method is characterized by higher clustering accuracy than the previous
algorithms.
2. Fuzzy Clustering Algorithms
FCM is a clustering algorithm based on objective functions, which adds memberships for each data
978-1-61275-029-3 /10/$25.00 ©2012 IERI
227
ICECE 2012
object to represent the degrees of the data object belonging to each class degree. We denote the
membership by µij ,the prototypes cluster i by vi , the distance between x j and vi by d ij . By the
normalized regulation, the total of a data sample membership equals 1. Let X = {x1 , x2 ⋯ xn } be in
M-dimensional space. Let c be the number of clusters.
The FCM algorithm partitions X into c fuzzy subsets by minimizing the following objective
function:
c
n
J FCM (U ,V ) = ∑∑ µijm d ij2
(1)
i =1 j =1
c
n
µij ∈ [0,1], ∀i, j; ∑ µij = 1, ∀j;0 < ∑ µij < n, ∀i
i =1
j =1
Where m is the fuzziness index, m = 2 [5], and d = (x j − vi ) (x j − vi ) . Minimization of J FCM is
T
2
ij
performed by a fixed-point iteration scheme known as the Alternating Optimization technique. The
minimization of the objective function gives the following Eqs:
1
µik =
(2)
2
c 
d ik  m−1
 
∑
j =1 
 d jk 
n
∑µ
vi =
m
ik k
x
k =1
n
∑µ
(3)
m
ik
k =1
3. Kernel Based Hybrid C-Means Clustering
A kernel function is a generalization of the distance metric that measures the distance two data
points as the data points are mapped into a high dimensional space in which they are more clearly
separable.
Suppose sample xk ∈ R N ,by mapping function, it can be mapped into a higher dimensional space
φ ( x1 ), φ (x2 )⋯φ ( xn ) ,the dot product in the high dimensional feature space can be calculated through
the kernel function K (xi , x j ) in the input space R N :
K (xi , x j ) = (φ ( xi ) • φ (x j ))
(4)
Some people proposed PFCM algorithm. The PFCM algorithm based on kernel function can be
called KPFCM. KPFCM minimizes the following objective function:
(
)
J KPFCM = ∑∑ aµijm + btijη Dij2 + ∑γ i ∑ (1 − tij )
c
n
i=1 j =1
c
n
i =1
j =1
η
(5)
c
0 ≤ µij , ∑ µij = 1, ∀j; tij < 1, a > 0, b > 0, m > 1,η > 1.
i =1
Where γ i are suitable positive numbers.
Dij2 = φ ( x j ) − φ (vi )
2
= φ ( x j )T φ ( x j ) + φ (vi )T φ (vi ) − 2φ ( x j )T φ (vi )
= K ( x j , x j ) + K (vi , vi ) − 2 K ( x j , vi )
If
we
adopt
Gaussian
function
as
a
kernel
228
function,
(
K ( x, y) = exp − β x − y
2
)
,then
Dij = 2(1 − K (x j , vi )) .
The minimization of the objective function gives the following Eqs:
(1 / (1 − K (xk , vi )))1 / m−1
µik = c
∑ (1 / (1 − K (xk , v j )))1 / m−1
(6)
j =1
tik =
1
 2b(1 − K ( xk , vi )) 
1+ 

γi


∑ (aµ
n
vi =
m
ik
)
+ btikη K ( xk , vi )xk
k =1
n
∑ (aµ
(7)
1 / η −1
(8)
)
+ btikη K (xk , vi )
m
ik
k =1
4. Kernel Clustering Algorithm Based On Sample Weighting
4.1 Sample Weighting
Definition: data point A = ϕ A ( x1 , x2 ⋯ xm ) ,where ϕ A is the weighting of data point A . And the
calculation formula of the weight of each data can be defined as follows[6]:
∑ exp(− α x
n
ϕj =
− xk
j
j ,k =1
2
)
(9)
Where α is a positive number.
From (9), we can know that the weight of a sample is related to the total distance between the data
point and all data points. However, outliers are farther than common data points, so they will get a
smaller weight. Thus it is good for clustering and filtering outliers.
4.2 Kernel Clustering Algorithm Based On Sample Weighting
Combining sample weighting and kernel function, we proposed a new clustering algorithm which is
called IKPFCM. The objective function of IKPFCM can be defined as the following:
(
)
J IKPFCM = 2∑∑ aµijm + btijη (1 − K (x j , vi ))
c
n
i =1 j =1
+ ∑ γ i ∑ (ϕ j − tij )
c
n
i =1
j =1
η
(10)
c
0 ≤ µij , ∑ µij = 1, ∀j; tij < 1, a > 0, b > 0, m > 1,η > 1.
i =1
Minimization of J IKPFCM is performed by a fixed-point iteration scheme known as the Alternating
Optimization technique. The minimization of the objective function gives the following Eqs:
(1 / (1 − K (xk , vi )))1 / m−1
µik = c
(11)
1 / m−1
(
(
(
)
)
)
1
/
1
−
K
x
,
v
∑
k
j
j =1
tik =
ϕk
 2b(1 − K ( xk , vi )) 
1+ 

γi


229
1 / η −1
(12)
∑ (aµ
n
vi =
m
ik
)
+ btikη K ( xk , vi )xk
k =1
n
∑ (aµ
(13)
)
+ btikη K (xk , vi )
m
ik
k =1
It is suggested to select γ i as
n
γi =
2∑ µikm (1 − K ( xk , vi ))
k =1
(14)
n
∑ µikm
k =1
The full description of IKPFCM algorithm is as follows:
Step1: fix c, m, α , β , a, b,η , ε , and set biggest cycle number rmax .
Step2:determine initial prototypes vi with FCM algorithm.
Step3: calculate kernel matrix.
Step4: for k = 1,2,⋯ rmax , do :
(a)
Update all membership values µik with Eqs(11).
(b) Update all typically values tik with Eqs(12).
(c)
Update all prototypes vi with Eqs(13).
(d) Compute γ i using Eqs(14).
(e) Compute E = max i ,k µikk − µikk −1 ,if E ≤ ε , stop, else k = k + 1 .
5. Simulation and Results
In order to test the effectiveness of the IKPFCM algorithm, we use FCM, PFCM, IKPFCM to test
two Data sets separately. Through experiment results for comparison and analysis, we conclude that
IKPFCM algorithm is better than other algorithms.
5.1 Test on X12
Data set X12 [7] has 12 data points, by doing experiments on this data set , we can get the following
results.
Table1 shows clustering center produced by FCM, PFCM, IKPFCM. Fig.1 shows the clustering
− 3.34 0
results for FCM, PFCM, IKPFCM. The true centers are vtrue = 
.
 3.34 0
Table1 Clustering centers produced by FCM, PFCM, IKPFCM.
FCM (m=2)
PFCM
(a=1,b=1,m=2,η=2)
IKPFCM
(a=1,b=1,m=2,η=2,α=1.0/0.4,β=2)
-2.98 0.54
-2.83
0.36
-3.34
0.00
2.98
2.83
0.36
3.34
0.00
0.54
230
Fig.1 ‘+’shows prototypes by FCM,’口’shows prototypes by PFCM, red ’o’ shows prototypes by IKPFCM
In order to reflect the accuracy of each clustering algorithm, we compute the error
2
E* = vtrue − v* ,where * is FCM / PFCM / KPFCM. Through calculation, we can easily get the
following results, E FCM = 0.4212 , E PFCM = 0.3897 and E IKPFCM = 0 . Clearly, from Fig.1, we can see
that the proposed method is superior to other previous methods.
5.2 Test on IRIS
We do test on IRIS[8] by FCM, PFCM, IKPFCM. Experimental conditions can be fixed as
following: ε = 0.0001 , rmax = 100 , a = 1 , b = 1 , m = 2 , η = 2 , α = 1 / 0.4 , β = 1 / 200 . Do experiments
on 150 data points of IRIS, we can get the following results.
Table 2 Without adding noise, do test by FCM, PFCM, IKPFCM
Algorithms Wrong scores Iteration Time(s) Center offsets
FCM
16
23
0.015
0.075
PFCM
15
26
0.031
0.051
IKPFCM
13
24
0.094
0.024
Table 3 Add 50 noises, do test by FCM, PFCM, IKPFCM
Algorithms
FCM
PFCM
IKPFCM
Wrong scores
50
50
11
Time(s)
0.031
0.047
0.125
Table 4 Add 100 noises, do test by FCM, PFCM, IKPFCM
Algorithms
FCM
PFCM
IKPFCM
Wrong scores
50
50
15
Time(s)
0.046
0.094
0.203
From Table 2 ,we can see that the IKPFCM algorithm has the least wrong scores, although it
takes a little more time. From Table 3 and 4, we can see that in noisy conditions IKPFCM algorithm
is much better than other algorithms. So IKPFCM algorithm can filter noise well.
6. Conclusions
In this paper, we have proposed a new algorithm which is based on sample weighting and applying
231
kernel function at the same time. It can filter noises and increase robustness well. Through doing
experiments on two data sets, the results show that IKPFCM algorithm is really characterized by
higher clustering accuracy than the previous algorithms.
7. Acknowledgments
This work is partially supported by National Science Foundation for Post-doctoral Scientists of
China (Grant No. 20070421041), Jiangsu Postdoctoral Sustentation Fund of China (Grant No.
0701045B) and Scientific Research Foundation of China University of Mining and Technology
(Grant No. 2007B017).
References
[1] Gao Xinbo, Xie Weixin. A research on fuzzy clustering theory and its applications [J]. Chinese
Science Bulletin, 1999, 44(21): 2241-2248.
[2] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum, NY,
1981.
[3] Wu Xiaohong, Zhou Jianjiang. Allied fuzzy c-means clustering model. Transaction of Nanjing
University of Aeronautics and Astronautics, 23(3), pp.208-213, 2006.
[4] Pal N R, Bezdek J C. On cluster validity for the fuzzy c2means model[ J ]. IEEE Tran. Fuzzy
Systems, 1995, 3 (3) : 370 - 379.
[5] Raghu Krishnapuram, James M. Keller. A Possibilistic Approach to Clustering[ J ]. IEEE
Transactions on Fuzzy Systems, 1993, 1 ( 2) :982110.
[6] Nikhil R. Pal, Kuhu Pal, James M. Keller, and James C. Bezdek. A Possibilistic Fuzzy c-Means
Clustering Algorithm. IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 13, NO. 4,
AUGUST 2005.
[7] Pal N R, Pal K, Bezdek J C, et al. A Possibilistic Fuzzy c-Means Clustering Algorithm [J].
IEEE Trans Fuzzy Systems, 2005, 13(4): 517-530.
[8] Bezdek J C, Keller J M, Krishnapuram R, et al. Will the Real Iris data please stand up? [J].
IEEE Trans Fuzzy Systems, 1999, 7(3): 368-369.
232