Lecture 8 Neurofuzzy

Fuzzy Inference Systems
Review Fuzzy Models
If <antecedence> then <consequence>.
Basic Configuration of a Fuzzy Logic System
Fuzzification
Inferencing
Input
Defuzzification
Output
Target
Error =Target -Output
Types of Rules
Mamdani Assilian Model
R1: If x is A1 and y is B1 then z is C1
R2: If x is A2 and y is B2 then z is C2
Ai , Bi and Ci, are fuzzy sets defined on the universes of x, y, z
respectively
Takagi-Sugeno Model
R1: If x is A1 and y is B1 then z =f1(x,y)
R1: If x is A2 and y is B2 then z =f2(x,y)
For example: fi(x,y)=aix+biy+ci
Types of Rules
Mamdani Assilian Model
Takagi-Sugeno Model
Mamdani Fuzzy Models
The Reasoning Scheme
Both antecedent and consequent are fuzzy
The Reasoning Scheme
Both antecedent and consequent are fuzzy
1: IF FeO is high & SiO2 is low
1
& Granite is prox & Fault is prox, THEN metal is high
Implication (Max)
=
0
2: IF FeO is aver & SiO2 is high
1
& Granite is interm & Fault is prox, THEN metal is aver
=
0
3: IF FeO is low
1
& SiO2 is high & Granite is dist & Fault is dist, THEN metal is low
=
0
30%
50%
70% 40%
FeO = 60%
55%
70%
SiO2 = 60%
0 km
10 km
20km 0 km
Granite = 5 km
5 km 10km
Fault = 1 km
0t
100t
Metal = ?
1000t
0t
100t
1000t
Defuzzifier
Since consequent is fuzzy, it has to be defuzzified
• Converts the fuzzy output of the inference engine
to crisp using membership functions analogous to
the ones used by the fuzzifier.
• Five commonly used defuzzifying methods:
– Centroid of area (COA)
– Bisector of area (BOA)
– Mean of maximum (MOM)
– Smallest of maximum (SOM)
– Largest of maximum (LOM)
Defuzzifier
Rule 1:
Rule 2:
+
+
Rule 3:
Aggregate (Max)
=
Defuzzify (Find centroid)
Formula for centroid
n

 x  (x )
i
i 0
i
n
 (x )
i 0
i
125 tonnes
metal
Sugeno Fuzzy Models
• Also known as TSK fuzzy model
– Takagi, Sugeno & Kang, 1985
Fuzzy Rules of TSK Model
While antecedent is fuzzy, consequent is crisp
If x is A and y is B then z = f(x, y)
Fuzzy Sets
The order of a Takagi-Sugeno type fuzzy
inference system = the order of the
polynomial used.
Crisp Function
f(x, y) is very often a polynomial
function w.r.t. x and y.
The Reasoning Scheme
Examples
R1: if X is small and Y is small then z = x +y +1
R2: if X is small and Y is large then z = y +3
R3: if X is large and Y is small then z = x +3
R4: if X is large and Y is large then z = x + y + 2
TAKAGI-SUGENO SYSTEM
1.
2.
3.
4.
IF x is f1x(x) AND y is f1y(y) THEN z1 = p10+p11x+p12y
IF x is f2x(x) AND y is f1y(y) THEN z2 = p20+p21x+p22y
IF x is f1x(x) AND y is f2y(y) THEN z3 = p30+p31x+p32y
IF x is f2x(x) AND y is f2y(y) THEN z4 = p40+p41x+p42y
The firing strength (= output of the IF part) of each rule is:
s1 = f1x(x) AND f1y(y)
s2 = f2x(x) AND f1y(y)
s3 = f1x(x) AND f2y(y)
s4 = f2x(x) AND f2y(y)
Output of each rule (= firing strength x consequent function) :
1. o1 = s1 ∙ z1
2. o2 = s2 ∙ z2
3. o3 = s3 ∙ z3
4. o4 = s4 ∙ z4
Overall output of the fuzzy inference system is:
o +o +o +o
z = s 1+ s 2+ s 3+ s 4
1
2
3
4
Sugeno system
Rule1:
IF FeO is high AND SiO2 is low AND Granite is proximal AND Fault is proximal, THEN
Gold =p1(FeO%)+q1(SiO2%) +r1(Distance2Granite)+s1(Distance2Fault)+t1
Rule 2:
IF FeO is average AND SiO2 is high AND Granite is intermediate AND Fault is proximal,
THEN Gold =p2(FeO%)+q2(SiO2%)+r2(Distance2Granite)+s2(Distance2Fault)+t2
Rule 3:
IF FeO is low AND SiO2 is high AND Granite is distal AND Fault is distal,
THEN Gold =p3(FeO%)+q3(SiO2%)+r3(Distance2Granite)+s3(Distance2Fault)+t3
18
Sugeno system
1: IF FeO is high X SiO2 is low
1
X Granite is prox X Fault is prox, THEN
Gold(R1) =p1(FeO%)+q1(SiO2%) +
r1(Distance2Granite)
+s1(Distance2Fault)+t1
s1
0
2: IF FeO is aver X SiO2 is high
1
X Granite is interm X Fault is prox, THEN Gold(R2) =p2(FeO%)+q2(SiO2%) +
r2(Distance2Granite)
+s2(Distance2Fault)+t2
s2
0
3: IF FeO is low
1
& SiO2 is high & Granite is dist & Fault is dist, THEN
Gold(R3) =p3(FeO%)+q3(SiO2%) +
r3(Distance2Granite)
+s3(Distance2Fault)+t3
s3
0
30%
50%
70% 40%
FeO = 60%
55%
70%
SiO2 = 60%
0 km
10 km
20km 0 km
Granite = 5 km
5 km 10km
Fault = 1 km
Metal = ?
Sugeno system: Output
Firing
strength
s1
Rule
output
Gold(R1) =p1(FeO%)+q1(SiO2%) +
r1(Distance2Granite)
+s1(Distance2Fault)+t1
s2
Gold(R2) =p2(FeO%)+q2(SiO2%) +
r2(Distance2Granite)
+s2(Distance2Fault)+t2
s3
Gold(R3) =p3(FeO%)+q3(SiO2%) +
r3(Distance2Granite)
+s3(Distance2Fault)+t3
Output 
s1 * Gold ( R1)  s2 * Gold ( R 2)  s3 * Gold ( R3)
s1  s2  s3
A neural fuzzy system
Implements FIS in the framework of NNs
Output Nodes
Antecedent Nodes
Fuzzification Nodes
x
y
Fuzzification Nodes
Represents the term sets of the features.
If we have two features x and y and two linguistic variables defined
on both of it say BIG and SMALL. Then we have 4 fuzzification
nodes.
BIG SMALL
x
BIG SMALL
y
We use Gaussian Membership functions for fuzzification --They are differentiable, triangular and trapezoidal membership
functions are NOT differentiable.
Fuzzification Nodes (Contd.)
  x   2 
z  exp 

2



 and  are two free parameters
of the membership functions
which needs to be determined
How to determine  and 
Two strategies:
1) Fixed  and 
2) Update  and  , through
any tuning algorithm
Consequent nodes
z  px  qy  k
p, q and k are three free
parameters of the consequent
polynomial function
How to determine p, q, k
Two strategies:
1) Fixed
2) Update through any tuning algorithm
Target (t)
Error =
½(t-o)2
z1
w1
z2
z4
z3
w3
w2
Output node
O = (w1z1+w2z2+w3z3+w4z4)/
(w1+w2+w3+w4
Consequent nodes
e.g. z4 = p4x + q4y + k4
w4
Antecedent nodes
e.g. If x is Small & y is Small
μx1
μx2
BIG
SMALL
x
μy2
μy1
SMALL
BIG
y
Fuzzification nodes
ANFIS Architecture
Squares: Adaptive nodes
Circles: Fixed nodes
ANFIS Architecture
Layer 1 (Adaptive)
Contains adaptive nodes, each with a Gaussian membership function:
  (c  x ) 2 

f ( x)  exp 
2
 

Number of nodes = number of variables x number of linguistic values
In the previous example there are 4 nodes (2 variable x 2 linguistic
values for each)
Two parameters to be estimated per node: mean (centre) and standard
deviation (spread)
These are called premise parameters
Number of premise parameters = 2 x number of nodes = 8 in the
example
ANFIS Architecture
Layer 2 (Fixed)
Contains fixed nodes, each with product operator (T-norm operator).
Returns the firing strength of each If-Then Rule.
s1  f1x  f1 y ; s2  f 2 x  f1 y
s3  f1x  f 2 y ; s4  f 2 x  f 2 y
The firing strength can be normalized. In ANFIS, each node returns
a normalized firing strength –
s1
s
s1  s2  s3  s4
Fixed nodes – no parameter to be estimated.
ANFIS Architecture
Layer 3 (Adaptive)
Each node contains an adaptive polynomial, and returns output
for each fuzzy If-Then rule
z1  s1  (p10  p11x  p12 y)
z 2  s2  (p 20  p 21x  p 22 y)
z 3  s3  (p 30  p 31x  p 32 y)
z 4  s4  (p 40  p 41x  p 42 y)
Number of nodes = number of If-Then Rules.
The parameters ps are called consequent parameters.
ANFIS Architecture
Layer 4 (Fixed)
Sums up the output of each node in the previous layer:
z  z1  z 2  z 3  z 4
A single node in this layer.
No parameter to be estimated.
ANFIS Training
z  z1  z 2  z 3  z 4
z1  s1  (p10  p11x  p12 y)
z 2  s2  (p 20  p 21x  p 22 y)
z 3  s3  (p 30  p 31x  p 32 y)
Linear in the consequent
parameters Pki, if the premise
parameters and, therefore, the
firing strengths sk of the fuzzy
if-then rules are fixed.
z 4  s4  (p 40  p 41x  p 42 y)
ANFIS uses a hybrid learning procedure (Jang and Sun, 1995) for
estimation of the premise and consequent parameters.
The hybrid learning procedure estimates the consequent parameters
(keeping the premise parameters fixed) in a forward pass and the
premise parameters (keeping the consequent parameters fixed) in a
backward pass.
ANFIS Training
The forward pass:
Propagate information
forward until Layer 3
Estimate the consequent
parameters by the least
square estimator.
Squares: Adaptive nodes
Circles: Fixed nodes
The backward pass:
Propagate the error
signals backwards and
update the premise
parameters by gradient
descent.
ANFIS Training : Least Square Estimation
1. Data assembled in form of (xn; yn)
2. We assume that there is a linear
relation between x and y:
y = ax + b
3. Can be extended to n dimensions:
y = a 1 x 1 + a2 x 2 + a3 x 3 + … + b
The problem: Given the function f, find values of coefficients ais such
that the linear combination best fits the data
ANFIS Training : Least Square Estimation
Given data {(x1; y1 (xN ; yN)}, we may define the error associated to saying
y = ax + b by:
This is just N times the variance of data : {y1 - (ax1+b),…., yn - (axN +b)}
The goal is to find values of a and b that minimize the error. In other
words minimize the partial derivative of the error wrt a and b:
ANFIS Training : Least Square Estimation
Which gives us:
We may rewrite them as:
The values of a and b which minimize the error satisfy the following
matrix equation:
Hence a and b
are estimated
using:
ANFIS Training : Least Square Estimation
For the following data find least square estimator
2
a  x
   

b   x
a
   
b
1
 x    x y 
1    y 
 1   x   x y 
1



2
   x  x 2   y 
x
.
1

x
.
x
    


SNo
1
2
3
4
5
6
7
8
9
TOTAL
X
2
3
4
6
8
1
2
11
14
51
Y
9
11
13
17
21
7
9
27
33
147
X2
XY
4
18
9
33
16
52
36 102
64 168
1
7
4
18
121 297
196 462
451 1157
ANFIS Training : Least Square Estimation
z1  s1  (p10  p11x  p12 y); z 2  s2  (p 20  p 21x  p 22 y); z 3  s3  (p 30  p 31x  p 32 y); z 4  s4  (p 40  p 41x  p 42 y)
Output  o  z1  z 2  z 3  z 4 
s1  (p10  p11x  p12 y)  s2  (p 20  p 21x  p 22 y)  s3  (p 30  p 31x  p 32 y)  s4  (p 40  p 41x  p 42 y)
Let the training data be :
 x1

 x2
x
 3
 x4

 x5
x
 6
y1
y2
y3
y4
y5
y6
t1 

t2 
t3 

t4 

t5 
t6 
E  (t1  [ s1  (p10  p11x1  p12 y1 )  s2  (p 20  p 21x1  p 22 y1 )  s3  (p 30  p 31x1  p 32 y1 )  s4  (p 40  p 41x1  p 42 y1 )]
2
:
E  (t6  [ s1  (p10  p11x 6  p12 y 6 )  s2  (p 20  p 21x 6  p 22 y 6 )  s3  (p 30  p 31x 6  p 32 y 6 )  s4  (p 40  p 41x 6  p 42 y 6 )]
2
E
p10
:
E
p 42
 2(t1  [ s1  (p10  p11x1  p12 y1 )  s2  (p 20  p 21x1  p 22 y1 )  s3  (p 30  p 31x1  p 32 y1 )  s4  (p 40  p 41x1  p 42 y1 )].( s1 )  0
 2(t1  [ s1  (p10  p11x1  p12 y1 )  s2  (p 20  p 21x1  p 22 y1 )  s3  (p 30  p 31x1  p 32 y1 )  s4  (p 40  p 41x1  p 42 y1 )].( y1 )  0
Simplify and use LSE.
ANFIS Training : Gradient descent
After the least square estimate of all consequent parameters , plug in :
- the parameter values,
- firing strength v alues and
- variable (x, y) values
(Target Output - Actual Output) 2
Error at layer 1  E 
2
Let the centre and spread parameters in Layer 1 be represente d by ci (centre) and si (spread).
E E O Oi si i

ci O Oi si i ci
where O is the output, fi is the fuzzy membership function. Similarly,
E E O Oi si i

 i O Oi si i  i
The above expression s are used to update centre and spread parameters .