Download Report

Expert Systems and
Knowledge Engineering
Hassan Saneifar, Ph.D.
Introduction to Artificial Intelligence
• The subfield of computer science concerned with symbolic
reasoning and non-algorithmic methods of problem solving
• How to make computers do things at which people are better
• Let discuss it a little more…
What is Intelligence?
•
Ability to understand and learn things.
•
Ability to think and understand instead of doing things by
instinct or automatically.
•
Ability to learn and understand, to solve problems and to make
decisions.
What is Artificial Intelligence?
•
As a science is to make machines do things that
would require intelligence if done by humans.
•
To develop more powerful, versatile programs that
can handle problems currently handled efficiently
only by the human mind [Balci 1996].
What is Artificial Intelligence?
How do we determine whether a particular computer
has demonstrated intelligence?
•
From a philosophical perspective, "one considers questions
regarding intelligence itself and whether machines can
possess actual intelligence or merely simulate its presence. »
•
From an applied perspective, the question is "how
technology can be applied to produce machines that behave
in intelligent ways" [Brookshear 1997].
Turing Imitation Game
Alan Turing questions:
•
•
Is there thought without experience?
•
Is there mind without communication?
•
Is there language without living?
•
Is there intelligence without life?
Turing Imitation Game:
•
Invented by the British mathematician Alan Turing
•
Around 50 years ago
Turing Imitation Game
Phase 1
In the first phase, the interrogator, a
man and a woman are each placed in
separate rooms. The interrogator’s
objective is to work out who is the man
and who is the woman by questioning
them. The man should attempt to
deceive the interrogator that he is the
woman, while the woman has to
convince the interrogator that she is
the woman.
Turing Imitation Game
Phase 2
In the second phase of the game, the
man is replaced by a computer
programmed to deceive the
interrogator as the man did. It would
e v e n b e p ro g r a m m e d t o m a k e
mistakes and provide fuzzy answers in
the way a human would. If the
computer can fool the interrogator as
often as the man did, we may say this
computer has passed the intelligent
behavior test.
Turing Remarks
•
By maintaining communication between the human and the
machine via terminals, the test gives us an objective standard
view on intelligence.
•
A program thought intelligent in some narrow area of
expertise is evaluated by comparing its performance with the
performance of a human expert.
•
To build an intelligent computer system, we have to capture,
organize and use human expert knowledge in some narrow
area of expertise.
AI Examples
–
–
–
–
–
–
–
–
http://www.generation5.org/jdk/demos.asp
http://www.aridolan.com/ofiles/eFloys.html
http://www.aridolan.com/ofiles/iFloys.html
http://www.arch.usyd.edu.au/~rob/#applets
http://www.softrise.co.uk/srl/old/caworld.html
http://people.clarkson.edu/~esazonov/neural_fuzzy/loadsway/LoadSway.htm
http://www.iit.nrc.ca/IR_public/fuzzy/FuzzyTruck.html
http://www.pandorabots.com/pandora/talk?botid=f5d922d97e345aa1
Expert Systems
•
Expert Systems (ES) are computer programs that try to replicate
knowledge and skills of human experts in some area, and then
solve problems in this area (the way human experts would).
•
ES take their roots in Cognitive Science — the study of human mind
using combination of AI and psychology.
•
Expert Systems which embody some non-algorithmic expertise for
solving certain types of problems.
•
ES were the first successful applications of AI to real–world problems
solving problems in medicine, chemistry, finance and even in space
(Space Shuttle, robots on other planets).
Expert System’s Background
1943: Post, E. L. proved that any computable problem can be solved using a set of IF–THEN
rules (Production Systems)
1961: GENERAL PROBLEM SOLVER (GPS) by A. Newell and H. Simon.
1969: DENDRAL (Feigenbaum, Buchanan, Lederberg) was the first system that showed the
importance of domain–specific knowledge (expertise) - (Knowledge-Based Systems).
1972 to 1980: MYCIN: Sepration of reasoning method from knowledge (Expert system’s
shell)
Production Systems
Production systems (or rule–based systems) are programs that
instead of conventional algorithms use sets of IF–THEN rules
(production rules). Unlike in algorithms, the order in which these rules
should be used is not specified. It is decided by the program itself with
respect to a problem state.
• In 1943, Post proved that any computable problem can be
implemented in a production system.
• Cognitive scientists became interested in production systems because
they seemed to represent better the way humans think and solve
problems.
Introduction to post and markov also in production systems.
More about conflict solving : cycles
Early Expert Systems:
General Problem Solver
•
In 1961, A. Newell and H. Simon wrote a program called General
Problem Solver (GPS) that could solve many different problems using
only a small set of rules
•
GPS used a strategy known as means–ends analysis •
GPS produced solutions very similar to those people came up with
•
Methods that can be applied to a broad range of problems are called
weak methods (because they use weak information about the
problem domain). Their performance, however, is also usually weak
Knowledge-Based Systems
•
DENDRAL (Feigenbaum et al, 1969) was a program that used rules to
infer molecular structure from spectral information. The challenge was
that the number of possible molecules was so large, that it was
impossible to check all of them using simple rules (weak method).
•
The researchers consulted experts in chemistry and added several
more specific rules to their program. The number of combinations the
program had to test was reduced dramatically due to the added
knowledge to the system.
•
DENDRAL demonstrated the importance of the domain–specific
knowledge.
•
Today Expert System are Knowledge-based Systems
Basic Concepts of ES
•
How to determine who experts are.
•
How expertise can be transferred from a person to
a computer.
•
How the system works.
Basic Concepts of ES
•
Expert: A human being who has developed a high
level of proficiency in making judgments in a
specific, usually narrow, domain.
Basic Concepts of ES
•
Expertise:
•
A specialized type of knowledge and skill that
experts have.
•
The implicit knowledge and skills of the expert
that must be extracted and made explicit so that
it can be encoded in an expert system
Features of Expert Systems
•
Expertise Possesses expertise for expert-level decisions
•
Symbolic reasoning Knowledge represented by symbolic representation
•
Deep knowledge Complex knowledge not easily known in non- experts
•
Self-knowledge Examine its own reasoning; provide explanations
Application of ES
Instances of ES
ES Components
Major components
•
Knowledge base
•
Inference engine
•
User interface
•
Blackboard (Working memory)
•
Explanation subsystem (justifier)
ES may also contain:
•
Knowledge acquisition subsystem
•
Knowledge refining system
Major Components of ES
Knowledge Base:
A collection of facts, rules, and procedures Organized into schemas.
The assembly of all the information and knowledge about a specific
field of interest
Inference engine:
The part of an expert system that actually performs the reasoning
function.
User interfaces:
The parts of computer systems that interact with users, accepting
commands from the computer keyboard and displaying the results
generated by other parts of the systems.
Major Components of ES
•
Blackboard (Working memory): An area of working memory set aside for the description of a
current problem (facts) and for recording intermediate results in
an expert system.
•
Explanation subsystem (justifier) The component of an expert system that can explain the
system’s reasoning and justify its conclusions.
Architecture of an Expert Systems
Explain more how an expert systems
work: the task of WM and reasoning
etc.
Knowledge Representation
•
A representation is a set of conventions about how to describe a class of
things. A description makes use of the conventions of a representation to
describe some particular thing.[Winston 1992].
Knowledge
Two special types of knowledge:
•
a priori
•
a posteriori
A priori knowledge:
•
comes before and is independent of knowledge from the senses
•
is considered to be universally true and cannot be denied without contradiction
•
examples of a priori knowledge: logic statements, mathematical laws, and the knowledge
possessed by teenagers
A posteriori knowledge:
•
is knowledge derived from the senses:
•
since sensory experience may not always be reliable, a posteriori knowledge can be
denied on the basis of new knowledge without the necessity of contradictions
Knowledge
Knowledge
Knowledge Representation
•
Knowledge engineer
•
•
An AI specialist responsible for the technical side of
developing an expert system. The knowledge engineer
works closely with the domain expert to capture the
expert’s knowledge in a knowledge base
Knowledge engineering (KE)
•
The engineering discipline in which knowledge is
integrated into computer systems to solve complex
problems normally requiring a high level of human
expertise
Knowledge Representation Schemes
• Representing the knowledge of humans in a systematic manner
• This knowledge is represented in a knowledge base such that it can
be retrieved for solving problems. • Some of knowledge representation schemes:
– Production Rules
– Semantic Networks
– Frames
– Logic:
– Propositional Logic
– First-order logic
– XML / RDF
– …
Semantic Networks
• Concepts as hierarchical networks [R. Quillian (1966,1968)]
• Amended with some additional psychological assumptions to characterize the
structure of human semantic memory. •
A semantic network is a structure for representing knowledge as a pattern of
interconnected nodes and arcs: •
•
Nodes: concepts of entities, attributes, events, values.
Arcs: relationships that hold between the concepts. •
Used for propositional information
•
A proposition: a statement that is either true or false
•
A labeled, directed graph
Semantic Networks
• Semantic Networks [Collins and Quillian 1969]:
– Concepts can be represented as hierarchies of inter-connected concept
nodes (e.g. animal, bird, canary) – Any concept has a number of associated attributes at a given level
(e.g. animal --> has skin; eats etc.) – Some concept nodes are superordinates of other nodes (e.g. animal
> bird) and some are subordinates (canary < bird)
– Subordinates inherit all the attributes of their superordinate concepts
(we will talk about penguins and ostriches !!! )
General Net. vs Semantic Net
Network Relationships
Semantic Network representation
of properties of snow and ice
Semantic Networks
Two types of commonly used links:
•
IS-A is an instance of' and refers to a specific member of a class (group of objects) •
A-KIND-OF The link AKO is used here to relate one class to another
•
AKO relates generic nodes to generic nodes while the IS-A relates an instance or
individual to a generic class
•
The objects in a class have one or more attributes in common
•
•
Each attribute has a value
The combination of attribute and value is a property
Semantic Networks
Exercises
• Represent the following two sentences into the appropriate semantic network:
– is_a(person, mammal)
– instance_of(N. Hejazi, person)
– team(N. Hejazi, Esteghlal)
– score(Tractor, Piroozi, 3-1)
– Ali gave Reza the book
all in one graph
Solution 1
• is_a(person, mammal)
• instance_of(N. Hejazi, person)
• team(N. Hejazi, Esteghlal)
mammal
is_a
person
has_part
head
instance_of
N. Hejazi
team
Esteghlal
Solution 2
• score(Tractor, Piroozi, 3-1)
Game
Is_a
Tractor
Away_team
Fixture 5
Score
Home_team
Piroozi
3-1
Solution 3
• Ali gave Reza the ES book
Gave
Book
Action
Ali
Agent
Event 1
Patient
Reza
Instance
Object
ES Book
Advantages of Semantic Networks
•
•
•
•
•
•
Easy to visualize and understand.
The knowledge engineer can arbitrarily define the relationships.
Related knowledge is easily categorized.
Efficient in space requirements.
Node objects represented only once.
Standard definitions of semantic networks have been developed.
Limitations of Semantic Networks
• The limitations of conventional semantic networks were studied
extensively by a number of workers in AI.
•
Many believe that the basic notion is a powerful one and has to be
complemented by, for example, logic to improve the notion’s expressive
power and robustness.
•
Others believe that the notion of semantic networks can be improved by
incorporating reasoning used to describe events.
Limitations of Semantic Networks
• Binary relations are usually easy to represent, but sometimes is difficult
- John caused problem to the party when he left.
• Other problematic statements:
- negation: John does not go fishing
- disjunction: John eats pizza or fish and chips
- …
• Quantified statements are very hard for semantic nets:
- Every dog has bitten a postman
- Every dog has bitten every postman
- Solution: Partitioned semantic networks
Partitioned Semantic Networks
•
To represent the difference between the description of an individual
object or process and the description of a set of objects. The set
description involves quantification [Hendrix (1976, 1979)]
•
Hendrix partitioned a semantic network whereby a semantic network,
loosely speaking, can be divided into one or more networks for the
description of an individual.
Partitioned Semantic Networks
•
The central idea of partitioning is to allow groups, nodes and arcs to be
bundled together into units called spaces – fundamental entities in
partitioned networks, on the same level as nodes and arcs (Hendrix
1979:59).
•
Every node and every arc of a network belongs to one or more spaces.
•
Some spaces are used to encode 'background information' or generic
relations; others are used to deal with specifics called 'scratch' space.
Partitioned Semantic Networks
• Suppose that we wish to make a specific statement about a dog, Danny,
who has bitten a postman, Peter:
– " Danny the dog bit Peter the postman"
• Hendrix’s Partitioned network would express this statement as an
ordinary semantic network:
S1
dog
bite
is_a
Danny
postman
is_a
agent
B
is_a
patient
Peter
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
– "Every dog has bitten a postman"
•
Hendrix partitioned semantic network now comprises two partitions SA and
S1. Node G is an instance of the special class of general statements about the
world comprising link statement, form, and one universal quantifier (∀)
SA
General
Statement
dog
S1
is_a
form
G
∀
bite
is_a
D
postman
is_a
agent
B
is_a
patient
P
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
– "Every dog has bitten every postman"
SA
General
Statement
dog
S1
is_a
form
G
∀
bite
is_a
D
postman
is_a
agent
∀
B
is_a
patient
P
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
– "Every dog in town has bitten the postman"
SA
dog
ako
General
Statement
S1
is_a
form
G
'ako' = 'A Kind Of '
bite
town dog
∀
is_a
is_a
D
postman
agent
B
is_a
patient
P
Exercises
• Try to represent the following two sentences into the appropriate
semantic network:
– "Ali believes that pizza is tasty"
– "Every student loves to have an exam" !!! ;-)
Solution 1: "Ali believes that pizza is tasty"
believes
is_a
Ali
agent
event
object
space
tasty
pizza
is_a
object
is_a
has
property
Frames
•
Represents related knowledge about a narrow subject that has much default
knowledge
•
A frame system would be a good choice for describing a mechanical device, for
example a car
•
The frame contrasts with the semantic net, which is generally used for broad
knowledge representation
•
Just as with semantic nets, there are no standards for defining frame-based systems
•
A frame is analogous to a record structure, corresponding to the fields and values of
a record are the slots and slot fillers of a frame
•
A frame is basically a group of slots and fillers that defines a stereotypical object
•
The car is the object, the slot name is the attribute, and the filler is the value
Frames
Frames
•
Frame-based expert systems are very useful for representing causal knowledge
because their information is organized by cause and effect
•
The slots may also contain procedures attached to the slots, called procedural
attachments
•
The if-needed type is executed when a filler value is needed but none are initially
present or the default value is not suitable
•
•
Defaults are often used to represent commonsense knowledge
•
The if-added type is run for procedures to be executed when a value is to be added
to a slot
•
An if-removal type is run whenever a value is to be removed from a slot
Slot fillers may also contain relations:
•
e.g. a-kind-of and is-a relations
Frames
Frames
Logic
•
Knowledge can also be represented by the
symbols of logic, which is the study of the rules of
exact reasoning.
•
Logic is also of primary importance in expert
systems in which the inference engine reasons
from facts to conclusions.
•
A descriptive term for logic programming and
expert systems is automated reasoning systems.
Formal logic
Formal logic is concerned with the syntax of statements, not their semantics
•
An example of formal logic, consider the following clauses with nonsense words
squeeg and moof
Premise: All squeegs are moofs Premise: John is a squeeg Conclusion: John is a moof
•
The argument is valid no matter what words are used
Premise: All X are Y Premise: Z is a X Conclusion: Z is a Y (is valid no matter what is substituted for X, Y, and Z)
•
Separating the form from the semantics, the validity of an argument can be
considered objectively, without prejudice caused by the semantic
Propositional logic
•
Propositional logic is used to assert propositions, which are
statements that are either true or false. It deals only with
the truth value of complete statements and does not
consider relationships or dependencies between objects.
•
Propositional logic is concerned with the subset of declarative
sentences that can be classified as either true or false
Propositional logic
•
A sentence whose truth value can be determined is called a
statement or proposition
•
A statement is also called a closed sentence because its
truth value is not open to question
•
Statements that cannot be answered absolutely are called
open sentences
•
A compound statement is formed by using logical
connectives on individual statements
Propositional logic
Propositional logic
Propositional logic
Propositional logic
Propositional logic
Propositional logic
First-Order Logic
•
First-order logic (FOL) is an extension and generalization of
propositional logic.
•
Its formulas contain variables which can be quantified: Two
common quantifiers are the existential ∃ and universal ∀
quantifiers.
•
The variables could be elements in the universe, or perhaps
relations or functions over the universe.
First-order Logic
Variable symbols: x, y, z, ...
Function symbols: f, g, h, f(x), g(x,y), ...
Predicate symbols: P, Q, R, P(x), Q(x,y),
Logic symbols: “¬”, “∧”, “∨”, “∃”, “∀”, “=”, “→”
Punctuation symbols: “(“, “)”, and “.”
First-order Logic
•
∀x∀y is the same as∀y∀x
•
∃x∃y is the same as ∃y∃x
•
∃x∀y is not the same as ∀y∃x
•
∃x ∀y Loves(x,y)
“There is a person who loves everyone in the world”
•
∀y ∃x Loves(x,y)
“Everyone in the world is loved by at least one person”
•
Quantifier duality: Each can be expressed using the other
∀x Likes(x, icecream)
¬∃x ¬Likes(x, iceCream)
∃x Likes(x, broccoli)
¬∀x ¬Likes(x, broccoli) Rules
• A Production Rule System emulates human reasoning using a set of
‘productions’
• Productions have two parts
– Sensory precondition (“IF” part)
– Action (“THEN” part)
• When the state of the ‘world’ matches the IF part, the production is fired,
meaning the action is executed
– The ‘world’ is the set of data values in the system’s working memory
– For a clinical expert systems, this is usually data about a patient, which, ideally, has come
from (and may go back to) an electronic medical record, or it may be entered interactively
(or usually a little of each)
• So production rules link facts (“IF” parts, also called antecedents) to
conclusions (“THEN” parts, also called consequents)
Rules
MYCIN Example
• MYCIN
– Developed at Stanford from 1972 to 1980
– Helped physicians diagnose a range of infectious blood diseases
• Separated the methods of reasoning on productions (‘inference
engine’) from the rules themselves (the ‘knowledge base)
– Became the first expert systems shell when distributed as ‘empty
MYCIN’ (EMYCIN)
Rules
MYCIN Example
Example MYCIN rule:
IF the stain of the organism is gram negative AND the morphology of the organism is rod AND the aerobicity of the organism is anaerobic THEN there is strongly suggestive evidence (0.8) that the class of the organism is Enterobacter iaceae.
• This rule has three predicates (yes/no, or Boolean, values that determine if it
should fire) • In this case each predicate involves the equality of a data field about a patient to a
specific qualitative value (e.g., [stain of the organism] = ‘gram negative’) • Note that human expertise is still needed – e.g., to decide that the morphology of
the organism is ‘rod’ (nonetheless to understand its vocabulary!) • Notice it produces a new fact (regarding [class of the organism]) • Note this is ‘symbolic reasoning’ – working with concepts as compared to numbers
(it’s not like y = x1 + 4.6 x2)
Different Types of Rules
Relationship Rules:
IF the battery is dead
Then the car will not start
Recommendation Rules:
IF the car will not start
THEN take a cab
Different Types of Rules
Directive Rules:
IF the car will not start
AND the fuel system is ok
THEN check out the electrical system
Strategy Rules:
IF the car will not start
THEN first check out the fuel system
THEN check out the electrical system
Different Types of Rules
Heuristic Rules:
IF the car will not start
AND the car is a 1957 Ford
THEN check the float
Meta Rules:
IF the car will not start
AND the electrical System is operating normally
THEN use rules concerning the fuel system
Reasoning
Deduction
Deductive reasoning, also deductive logic or logical deduction or, informally, "top-down" logic is the process of
reasoning from one or more statements (premises) to reach a logically certain conclusion. (reasoning from
the general to the specific)
- All men are mortal.
- Socrates is a man.
- Therefore, Socrates is mortal.
Induction Inductive reasoning is a reasoning in which the statements (premises) seek to supply strong evidence
for (not absolute proof of) the truth of the conclusion. In other words, The process of reasoning in which a
conclusion about all members of a class from examination of only a few members of the class; reasoning
from the particular to the general.
- 100% of biological life forms that we know of depend on liquid water to exist.
- Therefore, if we discover a new biological life form it will probably depend on liquid water to exist.
Abduction
Abductive reasoning is a form of logical inference that goes from an observation to a hypothesis that
accounts for the observation, ideally seeking to find the simplest and most likely explanation. In abductive
reasoning, unlike in deductive reasoning, the premises do not guarantee the conclusion.
- he lawn is wet.
- If it rained last night, then it would be unsurprising that the lawn is wet.
- Therefore, by abductive reasoning, the possibility that it rained last night is reasonable
Forward-chaining Inference
Forward-chaining Inference
Starts with some facts and applies rules to find all possible conclusions
Steps:
1. Consider the initial facts and store them in working memory
2. Check the antecedent part of the rules.
3. If all the conditions are matched, fire the rule.
4. If there is only one rule, do the following:
A. Perform necessary actions
B. Modify working memory and update facts.
C. Check for new conditions
5. If more than one rule is selected, use the conflict resolution strategy to
select the most appropriate rule and go to Step 4
6. Continue until an appropriate rule is found and executed.
Forward-chaining Inference
Forward-chaining Inference
Forward-chaining Example
Knowledge Base Rules:
Rule 1:
If the patient has a sore throat AND we suspect a bacterial Infection THEN we believe the
patient has strep throat
Rule 2:
IF the patient’s temperature is > 37 THEN the patient has a fever
Rule 3:
IF the patient has been sick over a month AND the patient has a fever THEN we suspect a
bacterial infection
Rule 4:
IF the patient has a fever THEN the patient can’t go out on a date
Rule 5:
IF the patient can’t go out on a date THEN the patient should stay home and read a book
Forward-chaining Example
Forward-chaining Example
Forward-chaining Example
Backward-chaining
Inference
Backward-chaining
Reasoning
Starts with the desired conclusion(s) and works backward to find supporting facts
Steps:
1. Start with a possible hypothesis, H.
2. Store the hypothesis H in working memory, along with the available facts.
3. If H is in the initial facts, the hypothesis is proven. Go to Step 7.
4. If H is not in the initial facts, find a rule R that has a descendent (action) part
mentioning the hypothesis.
5. Store R in the working memory.
6. Check conditions of R and match with the existing facts.
7. If matched, then fire the rule R and stop. Otherwise, continue to Step 4.
Backward-chaining
Inference
Backward-chaining
Inference
Backward-chaining Example
Backward-chaining Example 2
Forward-Chaining vs.
Backward-Chaining
• Forward chaining is reasoning from facts to the conclusions
resulting from those facts
– E.g., if you see that it is raining before leaving from home (the fact)
then you should decide to take an umbrella (the conclusion) • Backward chaining involves reasoning in reverse from a
hypothesis
• From a potential conclusion to be proved, to the facts that
support the hypothesis
– E.g., if you have not looked outside and someone enters with wet
shoes and an umbrella, your hypothesis is that it is raining
– In order to support this hypothesis you could ask the person if it was,
in fact, raining
– If the response is yes, then the hypothesis is proved true and
becomes a fact
Introduction to Uncertainty
Defining Uncertainty
•
Uncertainty is defined as the lack of the exact knowledge that would
enable us to reach a perfectly reliable conclusion.
• Information can be incomplete, inconsistent, uncertain, or all three. In
other words, information is often unsuitable for solving a problem.
•
Classical logic permits only exact reasoning. It assumes that perfect
knowledge always exists and we deal with the exact facts.
Sources of Uncertain Knowledge
• Ambiguity
• Incompleteness
• Incorrectness
• False positive (Type 1 error)
• False negative (Type 2 error)
• Human errors
• Machine errors
• Measurement errors
• Precision
• Accuracy
• Etc.
Dealing with Uncertainty
•
Classic Probability (Fermat & Pascal 1654)
•
Bayesian Probability
•
Hartly Theory (hartly 1928)
•
Shannon Theory (shannon 1948)
•
Dempster-Shafer Theory (Shafer 1976)
•
Fuzzy Theory (Zadeh 1965)
Probability Theory
• The concept of probability has a long history that goes back thousands of
years when words like “probably”, “likely”, “maybe”, “perhaps” and
“possibly” were introduced into spoken languages. However, the
mathematical theory of probability was formulated only in the 17th
century.
•
The probability of an event is the proportion of cases in which the event
occurs.
•
Probability can also be defined as a scientific measure of chance.
Probability Theory
• Probability can be expressed mathematically as a numerical index with a
range between zero (an absolute impossibility) to unity (an absolute
certainty).
• Most events have a probability index strictly between 0 and 1, which
means that each event has at least two possible outcomes:
- favorable outcome or success
- unfavorable outcome or failure
s
P(success ) = p =
s+ f
f
P( failure ) = q =
s+ f
Conditional Probability
•
•
•
Let A be an event in the world and B be another event. Suppose that
events A and B are not mutually exclusive, but occur conditionally on
the occurrence of the other.
The probability that event A will occur if event B occurs is called the
conditional probability.
Conditional probability is denoted mathematically as p(A|B)
– “Conditional probability of event A occurring given that event B has occurred”.
the number of times A and B can occur
p (A B )=
the number of times B can occur
Conditional Probability
• Joint probability:
•
Probability that both A and B will occur, is called the joint
probability of A and B
p(A ∩ B )
p (A B )=
p(B )
•
Similarly, the conditional probability of event B occurring given that
event A has occurred equals
p(B ∩ A)
p (B A)=
p(A)
Conditional Probability
Hence
p(B ∩ A) = p (B A)× p(A)
and
p(A ∩ B ) = p (B A)× p(A)
p(A ∩ B )
Substituting the last equation into the equation p(A B )=
p(B )
yields the Bayesian rule.
Bayesian Rule
p (A B )=
p (B A)× p(A)
p(B )
where:
p(A|B) is the conditional probability that event A occurs given that event B
has occurred;
p(B|A) is the conditional probability of event B occurring given that event A
has occurred;
p(A) is the probability of event A occurring;
p(B) is the probability of event B occurring.
The Joint Probability
n
n
i =1
i =1
∑ p(A ∩ Bi )= ∑ p(A Bi )× p(Bi )
A
B4
B3
B1
B2
The Joint Probability
•
If the occurrence of event A depends on only two mutually exclusive
events, B and NOT B, we obtain:
p(A) = p(A|B) × p(B) + p(A|¬B) × p(¬B)
where ¬ is the logical function NOT.
•
Similarly,
p(B) = p(B|A) × p(A) + p(B|¬A) × p(¬A)
•
Substituting this equation into the Bayesian rule yields:
p (A B )=
p (B A)× p(A)
p (B A)× p(A)+ p (B ¬A)× p(¬A)
Bayesian Reasoning
• Suppose all rules in the knowledge base are represented in the following
form:
IF
THEN
E
H
is true
is true {with probability p}
•
This rule implies that if event E occurs, then the probability that event H
will occur is p.
•
In expert systems, H usually represents a hypothesis and E denotes
evidence to support this hypothesis.
Bayesian Reasoning
The Bayesian rule expressed in terms of hypotheses and evidence looks like
this:
p (H E )=
p (E H )× p(H )
p (E H )× p(H )+ p (E ¬H )× p(¬H )
where:
p(H) is the prior probability of hypothesis H being true;
p(E|H) is the probability that hypothesis H being true will result in evidence
E;
p(¬H) is the prior probability of hypothesis H being false;
p(E|¬H) is the probability of finding evidence E even when hypothesis H is
false.
Bayesian Reasoning
• In expert systems, the probabilities required to solve a problem are
provided by experts.
•
An expert determines the prior probabilities for possible hypotheses
p(H) and p(¬H), and also the conditional probabilities for observing
evidence E if hypothesis H is true, p(E|H), and if hypothesis H is false,
p(E|¬H).
•
Users provide information about the evidence observed and the expert
system computes p(H|E) for hypothesis H in light of the user-supplied
evidence E.
•
Probability p(H|E) is called the posterior probability of hypothesis H
upon observing evidence E.
Bayesian Reasoning
•
We can take into account both multiple hypotheses H1, H2,..., Hm and
multiple evidences E1, E2,..., En. (The hypotheses as well as the
evidences must be mutually exclusive and exhaustive)
•
Single evidence E and multiple hypotheses follow:
p (H i E )=
p (E H i )× p(H i )
m
∑ p(E H k )× p(H k )
k =1
•
Multiple evidences and multiple hypotheses follow:
p (H i E1 E2 . . . En )=
p (E1 E2 . . . En H i )× p(H i )
m
∑ p(E1 E2 . . . En H k )× p(H k )
k =1
Bayesian Reasoning
• This requires to obtain the conditional probabilities of all possible
combinations of evidences for all hypotheses, and thus places an
enormous burden on the expert.
• Therefore, in expert systems, conditional independence among different
evidences assumed. Thus, instead of the unworkable equation, we attain:
p (H i E1 E2 . . . En )=
p (E1 H i )× p (E2 H i )× . . . × p (En H i )× p(H i )
m
∑ p(E1 H k )× p(E2 H k )× . . . × p(En H k )× p(H k )
k =1
Ranking Potentially True Hypotheses
• Let us consider a simple example:
– Suppose an expert, given three conditionally independent evidences
E1, E2,..., En, creates three mutually exclusive and exhaustive
hypotheses H1, H2,..., Hm, and provides prior probabilities for these
hypotheses – p(H1), p(H2) and p(H3), respectively. The expert also
determines the conditional probabilities of observing each evidence
for all possible hypotheses.
The Prior and Conditional Probabilities
Probability
Hypothesis
i =1
i =2
i =3
p (H i )
0.40
0.35
0.25
p (E1 H i )
0.3
0.8
0.5
p (E2 H i )
0.9
0.0
0.7
p (E3 H i )
0.6
0.7
0.9
Assume that we first observe evidence E3. The expert system computes the
posterior probabilities for all hypotheses as:
The Prior and Conditional Probabilities
p (H i E3 )=
p (E3 H i )× p(H i )
3
,
i = 1, 2, 3
∑ p(E3 H k )× p(H k )
thus
k =1
0.6 ⋅ 0.40
p (H1 E3 )=
= 0.34
0.6 ⋅ 0.40 + 0.7 ⋅ 0.35 + 0.9 ⋅ 0.25
0.7 ⋅ 0.35
p (H 2 E3 )=
= 0.34
0.6 ⋅ 0.40 + 0.7 ⋅ 0.35 + 0.9 ⋅ 0.25
0.9 ⋅ 0.25
p (H 3 E3 )=
= 0.32
0.6 ⋅ 0.40 + 0.7 ⋅ 0.35 + 0.9 ⋅ 0.25
After evidence E3 is observed, belief in hypothesis H2 increases and
becomes equal to belief in hypothesis H1. Belief in hypothesis H3 also
increases and even nearly reaches beliefs in hypotheses H1 and H2.
The Prior and Conditional Probabilities
Suppose now that we observe evidence E1. The posterior probabilities are
calculated as
p (H i E1E3 )=
3
hence
p (E1 H i )× p (E3 H i )× p(H i )
,
i = 1, 2, 3
∑ p(E1 H k )× p(E3 H k )× p(H k )
k =1
0.3 ⋅ 0.6 ⋅ 0.40
p (H1 E1E3 )=
= 0.19
0.3 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.9 ⋅ 0.25
0.8 ⋅ 0.7 ⋅ 0.35
p (H 2 E1E3 )=
= 0.52
0.3 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.9 ⋅ 0.25
0.5 ⋅ 0.9 ⋅ 0.25
p (H 3 E1E3 )=
= 0.29
0.3 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.9 ⋅ 0.25
Hypothesis H2 has now become the most likely one.
The Prior and Conditional Probabilities
After observing evidence E2, the final posterior probabilities for all hypotheses are
calculated:
p (H i E1E2 E3 )=
3
hence
p (E1 H i )× p (E2 H i )× p (E3 H i )× p(H i )
,
i = 1, 2, 3
∑ p(E1 H k )× p(E2 H k )× p(E3 H k )× p(H k )
k =1
0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40
p (H1 E1E2 E3 )=
= 0.45
0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25
0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35
p (H 2 E1E2 E3 )=
=0
0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25
0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25
p (H 3 E1E2 E3 )=
= 0.55
0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25
Although the initial ranking was H1, H2 and H3, only hypotheses H1 and H3 remain under
consideration after all evidences (E1, E2 and E3) were observed.
Exercise
• From which bowl is the cookie?
To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10
chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our
friend Fred picks a bowl at random, and then picks a cookie at random.
We may assume there is no reason to believe Fred treats one bowl
differently from another, likewise for the cookies. The cookie turns out to
be a plain one. How probable is it that Fred picked it out of bowl #1?
(taken from wikipedia.org)
Solution
• Let H1 correspond to bowl #1, and H2 to bowl #2. It is given that the bowls are
identical from Fred's point of view, thus P(H1) = P(H2), and the two must add up to 1,
so both are equal to 0.5.
• D is the observation of a plain cookie. From the contents of the bowls, we know that
P(D | H1) = 30/40 = 0.75 and P(D | H2) = 20/40 = 0.5. Bayes' formula then yields
• Before observing the cookie, the probability that Fred chose bowl #1 is the prior
probability, P(H1), which is 0.5. After observing the cookie, we revise the probability
to P(H1|D), which is 0.6.
Certainty Factor
•
A certainty factor (cf), a number to measure the expert’s belief.
•
The maximum value of the certainty factor is, say, +1.0 (definitely true)
and the minimum –1.0 (definitely false). For example, if the expert states
that some evidence is almost certainly true, a cf value of 0.8 would be
assigned to this evidence.
• Certainty factors theory is a popular alternative to Bayesian reasoning.
Uncertain Terms
Term
Certainty Factor
_
Definitely not
1.0
_
Almost certainly not
0.8
_
Probably not
0.6
_
Maybe not
0.4
_
Unknown
0.2 to +0.2
Maybe
+0.4
Probably
+0.6
Almost certainly
+0.8
Definitely
+1.0
Certainty Factor
• In expert systems with certainty factors, the knowledge base consists of a
set of rules that have the following syntax:
IF
THEN
<evidence>
<hypothesis> {cf }
where cf represents belief in hypothesis H, given that evidence E has
occurred.
Certainty Factor
•
The certainty factors theory is based on two functions:
• Measure of belief MB(H,E)
• Measure of disbelief MD(H,E)
$
1
!!
MB (H, E) = # max [ p(H |E), p(H ) ] − p(H )
!!
max [1, 0] − p(H )
"
if p(H ) = 1
$
1
!!
MD (H, E) = # min [ p(H |E), p(H ) ] − p(H )
!!
min [1, 0] − p(H )
"
if p(H ) = 0
otherwise
otherwise
Certainty Factor
•
The values of MB(H, E) and MD(H, E) range between 0 and 1. •
The strength of belief or disbelief in hypothesis H depends on the kind of
evidence E observed.
•
Some facts may increase the strength of belief, but some increase the
strength of disbelief.
• The total strength of belief or disbelief in a hypothesis:
MB(H, E )− MD(H, E )
cf =
1 - min[MB(H, E ), MD(H, E )]
Certainty Factor
Example: Consider a simple rule:
IF
THEN
A is X
B is Y
• An expert may not be absolutely certain that this rule holds.
• Also, suppose it has been observed that in some cases, even when the
IF part of the rule is satisfied and object A takes on value X, object B
can acquire some different value like Z.
IF
THEN
A is X
B is Y {cf 0.7};
B is Z {cf 0.2}
Certainty Factor
• The certainty factor assigned by a rule is propagated through the
reasoning chain.
• This involves establishing the net certainty of the rule consequent
when the evidence in the rule antecedent is uncertain:
cf (H,E) = cf (E) * cf(H)
For example:
IF
sky is clear
THEN
the forecast is sunny {cf 0.8}
and the current certainty factor of sky is clear is 0.5, then
cf (H,E) = 0.5 * 0.8 = 0.4
This result can be interpreted as "It may be sunny".
Certainty Factor
•
For conjunctive rules such as:
<evidence E >
1
..
.
AND <evidence En>
THEN <hypothesis H> {cf }
IF
the certainty of hypothesis H, is established as follows:
cf (H, E1 ∩ E2 ∩... ∩ En) = min [cf (E1), cf (E2),..., cf (En)] * cf
• For example:
IF
sky
AND the forecast
THEN the action
is clear
is sunny
is 'wear sunglasses' {cf 0.8}
and the certainty of sky is clear is 0.9 and the certainty of the forecast of
sunny is 0.7, then
cf (H, E1 ∩ E2) = min [0.9, 0.7] * 0.8 = 0.7 * 0.8 = 0.56
Certainty Factor
•
For disjunctive rules such as:
<evidence E >
1
..
.
<evidence E >
OR
n
THEN <hypothesis H> {cf }
IF
the certainty of hypothesis H, is established as follows:
cf (H, E1 ∪ E2 ∪... ∪ En) = max [cf (E1), cf (E2),..., cf (En)] * cf
• For example:
IF
sky
OR
the forecast
THEN the action
is overcast
is rain
is 'take an umbrella' {cf 0.9}
and the certainty of sky is overcast is 0.6 and the certainty of the forecast
of rain is 0.8, then
cf (H, E1 ∪ E2) = max [0.6, 0.8] x 0.9 = 0.8 * 0.9 = 0.72
Certainty Factor
• When the same consequent is obtained as a result of the execution of
two or more rules, the individual certainty factors of these rules must be
merged to give a combined certainty factor for a hypothesis.
• Suppose the knowledge base consists of the following rules:
Rule 1:
IF
THEN
A is X
C is Z {cf 0.8}
Rule 2:
IF
THEN
B is Y
C is Z {cf 0.6}
What certainty should be assigned to object C having value Z if both
Rule 1 and Rule 2 are fired?
Certainty Factor
• Common sense suggests that, if we have two pieces of evidence (A is X
and B is Y) from different sources (Rule 1 and Rule 2) supporting the
same hypothesis (C is Z), then the confidence in this hypothesis should
increase and become stronger than if only one piece of evidence had been
obtained.
Certainty Factor
• To calculate a combined certainty factor we can use the following
equation:
$ cf1 + cf2 × (1 − cf1) if cf1 > 0 and cf2 > 0
!
!!
cf1 + cf2
if cf1 < 0 or cf2 < 0
cf (cf1, cf2) = #
! 1 − min [|cf |, |cf |]
1
2
!
!" cf + cf × (1 + cf ) if cf < 0 and cf < 0
1
2
1
1
2
where:
cf1 is the confidence in hypothesis H established by Rule 1;
cf2 is the confidence in hypothesis H established by Rule 2;
|cf1| and |cf2| are absolute magnitudes of cf1 and cf2, respectively.
Certainty Factor
•
The certainty factors theory provides a practical alternative to
Bayesian reasoning.
•
The heuristic manner of combining certainty factors is different from the
manner in which they would be combined if they were probabilities.
•
The certainty theory is not “mathematically pure” but does mimic the
thinking process of a human expert.
Bayesian Reasoning Vs Certainty Factors
• Probability theory is the oldest and best-established technique to deal
with inexact knowledge and random data.
•
It works well in such areas as forecasting and planning, where statistical
data is usually available and accurate probability statements can be
made.
•
However, in many areas of possible applications of expert systems,
reliable statistical information is not available or we cannot assume the
conditional independence of evidence.
As a result, many researchers have found the Bayesian method
unsuitable for their work. This dissatisfaction motivated the development
of the certainty factors theory.
Bayesian Reasoning Vs Certainty Factors
• Although the certainty factors approach lacks the mathematical
correctness of the probability theory, it outperforms subjective Bayesian
reasoning in such areas as diagnostics.
•
Certainty factors are used in cases where the probabilities are not
known or are too difficult or expensive to obtain.
•
The certainty factors approach also provides better explanations of the
control flow through a rule-based expert system.
Bayesian Reasoning Vs Certainty Factors
• The Bayesian method is likely to be the most appropriate if reliable
statistical data exists, the knowledge engineer is able to lead, and the
expert is available for serious decision-analytical conversations.
• In the absence of any of the specified conditions, the Bayesian
approach might be too arbitrary and even biased to produce meaningful
results.
• The Bayesian belief propagation is of exponential complexity, and thus is
impractical for large knowledge bases.
Introduction to Fuzzy Logic
Definition
•
Experts rely on common sense when they solve problems.
• How can we represent expert knowledge that uses vague and
ambiguous terms in a computer?
• Fuzzy logic is not logic that is fuzzy, but logic that is used to describe
fuzziness. Fuzzy logic is the theory of fuzzy sets, sets that calibrate
vagueness.
•
Fuzzy logic is based on the idea that all things admit of degrees.
Temperature, height, speed, distance, beauty – all come on a sliding
scale.
– The motor is running really hot.
– Tom is a very tall guy.
Definition
• How is one to represent notions like:
–
–
–
–
large profit
high pressure
tall man
moderate temperature
• The principal notion underlying set theory, that an element can
(exclusively) either belong to set or not belong to a set, makes it well
nigh impossible to represent much of human discourse. Definition
•
People succeed by using knowledge that is imprecise rather than
precise in many decision-making and problem-solving tasks that are too
complex to be understood quantitatively
•
Fuzzy set theory resembles human reasoning in its use of approximate
information and uncertainty to generate decisions.
•
Fuzzy set theory was specifically designed to mathematically
represent uncertainty and vagueness.
Definition
• Boolean logic uses sharp distinctions. It forces us to draw lines
between members of a class and non-members.
• For instance, we may say, Tom is tall because his height is 181 cm. If we
drew a line at 180 cm, we would find that David, who is 179 cm, is small.
• Is David really a small man or we have just drawn an arbitrary line in the
sand?
Bit of History
• Fuzzy, or multi-valued logic, was introduced in the 1930s by Jan
Lukasiewicz, a Polish philosopher. While classical logic operates with
only two values 1 (true) and 0 (false), Lukasiewicz introduced logic that
extended the range of truth values to all real numbers in the interval
between 0 and 1.
•
For example, the possibility that a man 181 cm tall is really tall might be
set to a value of 0.86. It is likely that the man is tall. This work led to an
inexact reasoning technique often called possibility theory.
•
In 1965 Lotfi Zadeh, published his famous paper “Fuzzy sets”. Zadeh
extended the work on possibility theory into a formal system of
mathematical logic, and introduced a new concept for applying natural
language terms. This new logic for representing and manipulating fuzzy
terms was called fuzzy logic.
The Term “Fuzzy Logic”
• The term fuzzy logic is used in two senses:
– Narrow sense: Fuzzy logic is a branch of fuzzy set theory, which
deals (as logical systems do) with the representation and inference
from knowledge. Fuzzy logic, unlike other logical systems, deals with
imprecise or uncertain knowledge. In this narrow, and perhaps correct
sense, fuzzy logic is just one of the branches of fuzzy set theory.
– Broad Sense: fuzzy logic synonymously with fuzzy set theory
Fuzzy Applications
• Theory of fuzzy sets and fuzzy logic has been applied to problems in a
variety of fields:
– taxonomy; topology; linguistics; logic; automata theory; game theory; pattern
recognition; medicine; law; decision support; Information retrieval; etc.
• And more recently fuzzy machines have been developed including:
– automatic train control; tunnel digging machinery; washing machines; rice
cookers; vacuum cleaners; air conditioners, etc.
Fuzzy Applications
• Extraklasse Washing Machine - 1200 rpm. The Extraklasse machine has a number of features
which will make life easier for you.
• Fuzzy Logic detects the type and amount of laundry in the drum and allows only as much
water to enter the machine as is really needed for the loaded amount. And less water will heat
up quicker - which means less energy consumption.
• Foam detection Too much foam is compensated by an additional rinse cycle: If Fuzzy Logic detects the
formation of too much foam in the rinsing spin cycle, it simply activates an additional rinse
cycle. Fantastic!
• Imbalance compensation In the event of imbalance, Fuzzy Logic immediately calculates the maximum possible speed,
sets this speed and starts spinning. This provides optimum utilization of the spinning time at
full speed […]
• Washing without wasting - with automatic water level adjustment
More Definitions
•
Fuzzy logic is a set of mathematical principles for knowledge
representation based on degrees of membership.
•
Unlike two-valued Boolean logic, fuzzy logic is multi-valued. It deals
with degrees of membership and degrees of truth.
•
Fuzzy logic uses the continuum of logical values between 0
(completely false) and 1 (completely true). Instead of just black and
white, it employs the spectrum of colors, accepting that things can be
partly true and partly false at the same time.
0
0
0 1
1
(a) Boolean Logic.
1
0 0
0.2
0.4
0.6
0.8
1 1
(b) Multi-valued Logic.
Fuzzy Sets
•
The concept of a set is fundamental to mathematics.
•
However, our own language is also the supreme expression of sets. For
example, car indicates the set of cars. When we say a car, we mean one
out of the set of cars.
•
The classical example in fuzzy sets is tall men. The elements of the
fuzzy set “tall men” are all men, but their degrees of membership depend
on their height.
Fuzzy Sets
Degree of Membership
Crisp
Fuzzy
Name
Height, cm
Chris
208
1
1.00
Mark
John
205
198
1
1
1.00
0.98
Tom
David
181
179
1
0
0.82
0.78
Mike
172
0
0.24
Bob
Steven
167
158
0
0
0.15
0.06
Bill
Peter
155
152
0
0
0.01
0.00
Crisp Vs Fuzzy Sets
The x-axis represents the
universe of discourse – the
range of all possible values
applicable to a chosen variable.
In our case, the variable is the
man height. According to this
representation, the universe of
men’s heights consists of all
tall men.
The y-axis represents the
membership value of the
fuzzy set. In our case, the
fuzzy set of “tall men” maps
height values into
corresponding membership
values.
Degree of
Membership
1.0
Crisp Sets
0.8
Tall Men
0.6
0.4
0.2
0.0
150
160
170
Degree of
Membership
1.0
180
190
200
210
Height, cm
190
200
210
Fuzzy Sets
0.8
0.6
0.4
0.2
0.0
150
160
170
180
Height, cm
A Fuzzy Set has Fuzzy Boundaries
•
Let X be the universe of discourse and its elements be denoted as x. In
the classical set theory, crisp set A of X is defined as function fA(x)
called the characteristic function of A:
fA(x) : X à {0, 1}, where
#1, if x ∈ A
f A ( x) = "
!0, if x ∉ A
• This set maps universe X to a set of two elements.
• For any element x of universe X, characteristic function fA(x) is equal to 1
if x is an element of set A, and is equal to 0 if x is not an element of A.
A Fuzzy Set has Fuzzy Boundaries
•
In the fuzzy theory, fuzzy set A of universe X is defined by function
µA(x) called the membership function of set A
µA(x) : X à {0, 1}, where
µA(x) = 1
if x is totally in A;
µA(x) = 0
if x is not in A;
0 < µA(x) < 1 if x is partly in A.
• This set allows a continuum of possible choices. For any element x of
universe X, membership function µA(x) equals the degree to which x is
an element of set A.
• This degree, a value between 0 and 1, represents the degree of
membership, also called membership value, of element x in set A.
Fuzzy Set Representation
•
•
First, we determine the membership functions. In our “tall men”
example, we can obtain fuzzy sets of tall, short and average men.
Degree of
Membership
1.0
The universe of discourse – the men’s heights – consists of three sets:
Short
Short
Tall
Average
Tall Men
short, average and tall men. As you 0.8will see, a man who is 184 cm
tall is
0.6
a member of the average men set with a degree of membership of 0.1,
0.4
and at the same time, he is also a 0.2member of the tall men set with a
degree of 0.4.
0.0
150
Degree of
Membership
1.0
Average
Short
Tall
Tall Men
0.6
0.4
0.4
0.2
0.2
0.0
0.0
Degree of
Membership
160
170
180
Fuzzy Sets
170
190
200
180
190
200
210
Height, cm
Fuzzy Sets
0.8
0.6
150
160
Degree of
Membership
1.0
Crisp Sets
Short
0.8
Crisp Sets
210
Height, cm
Short
Tall
Average
Tall
150
160
170
180
190
200
210
Fuzzy Set Representation
•
Typical functions that can be used to represent a fuzzy set:
• sigmoid
• gaussian
• pi.
• However, these functions increase the time of computation. Therefore,
in practice, most applications use linear fit functions.
µ (x)
X
Fuzzy Subset A
1
0
Crisp Subset A
Fuzziness
Fuzziness
x
Linguistic Variables and Hedges
• A linguistic variable is a fuzzy variable. For example, the statement
“John is tall” implies that the linguistic variable John takes the linguistic
value tall •
In fuzzy expert systems, linguistic variables are used in fuzzy rules. For
example: IF
wind
THEN sailing
is strong
is good
IF
project_duration
THEN completion_risk
is long
is high
IF
speed
is slow
THEN stopping_distance
is short
Linguistic Variables and Hedges
•
The range of possible values of a linguistic variable represents the
universe of discourse of that variable. For example, the universe of
discourse of the linguistic variable speed might have the range between 0
and 220 km/h and may include such fuzzy subsets as very slow, slow,
medium, fast, and very fast.
•
A linguistic variable carries with it the concept of fuzzy set qualifiers,
called hedges.
•
Hedges are terms that modify the shape of fuzzy sets. They include
adverbs such as very, somewhat, quite, more or less and slightly.
Linguistic Variables and Hedges
Degree of
Membership
1.0
Short
0.8
Short
Tall
Average
0.6
0.4
0.2
Very Short
Very
Very
Tall
Tall
Tall
0.0
150
160
170
180
190
200
210
Height, cm
Linguistic Variables and Hedges
Hedge
Mathematical
Expression
A little
[µA ( x )]1.3
Slightly
[µA ( x )]1.7
Very
[µA ( x )]2
Extremely
[µA ( x )]3
Graphical Representation
Linguistic Variables and Hedges
Hedge
Very very
Mathematical
Expression
[µA ( x )]4
More or less
µ (x)
A
Somewhat
µ (x)
A
2 [µA ( x )]2
Indeed
if 0 ≤ µA ≤ 0.5
1 − 2 [1 − µA ( x )]2
if 0.5 < µA ≤ 1
Graphical Representation
Characteristics of Fuzzy Sets
•
The classical set theory developed in the late 19th century by Georg
Cantor describes how crisp sets can interact. These interactions are
called operations.
•
Also fuzzy sets have well defined properties.
•
These properties and operations are the basis on which the fuzzy sets are
used to deal with uncertainty on the one hand and to represent
knowledge on the other.
Operations
Not A
B
A
AA
Complement
Containment
A
B
Intersection
AA
B
Union
Complement
•
•
Crisp Sets: Who does not belong to the set?
Fuzzy Sets: How much do elements not belong to the set?
• The complement of a set is an opposite of this set. For example, if we
have the set of tall men, its complement is the set of NOT tall men. When
we remove the tall men set from the universe of discourse, we obtain the
complement.
•
If A is the fuzzy set, its complement ¬A can be found as follows:
µ¬A(x) = 1 − µA(x)
Containment
•
•
Crisp Sets: Which sets belong to which other sets?
Fuzzy Sets: Which sets belong to other sets?
•
Similar to a Chinese box, a set can contain other sets. The smaller set is
called the subset. For example, the set of tall men contains all tall men;
very tall men is a subset of tall men. However, the tall men set is just a
subset of the set of men. In crisp sets, all elements of a subset entirely
belong to a larger set. In fuzzy sets, however, each element can belong
less to the subset than to the larger set. Elements of the fuzzy subset
have smaller memberships in it than in the larger set.
Intersection
•
•
Crisp Sets: Which element belongs to both sets?
Fuzzy Sets: How much of the element is in both sets?
•
In classical set theory, an intersection between two sets contains the
elements shared by these sets. For example, the intersection of the set of
tall men and the set of fat men is the area where these sets overlap. In
fuzzy sets, an element may partly belong to both sets with different
memberships.
•
A fuzzy intersection is the lower membership in both sets of each
element. The fuzzy intersection of two fuzzy sets A and B on universe of
discourse X:
µA∩B(x) = min [µA(x), µB(x)] = µA(x) ∩ µB(x),
where x∈X
Union
•
•
Crisp Sets: Which element belongs to either set?
Fuzzy Sets: How much of the element is in either set?
•
The union of two crisp sets consists of every element that falls into either
set. For example, the union of tall men and fat men contains all men
who are tall OR fat.
•
In fuzzy sets, the union is the reverse of the intersection. That is, the
union is the largest membership value of the element in either set.
The fuzzy operation for forming the union of two fuzzy sets A and B on
universe X can be given as: µA∪B(x) = max [µA(x), µB(x)] = µA(x) ∪ µB(x),
where x∈X
Operations of Fuzzy Sets
µ(x)
µ(x)
1
1
B
A
A
0
1
x
0
1
Not A
0
Complement
x
0
Containment
µ(x)
µ(x)
1
1
A
B
0
1
B
A
x
A∩B
0
Intersection
A
x
x
B
0
x
1
x
0
A∪B
Union
x
Equality
•
Fuzzy set A is considered equal to a fuzzy set B, IF AND ONLY IF (iff):
µA(x) = µB(x), ∀x∈X
A = 0.3/1 + 0.5/2 + 1/3
B = 0.3/1 + 0.5/2 + 1/3
therefore A = B
Inclusion
•
Inclusion of one fuzzy set into another fuzzy set. Fuzzy set A ⊆ X is
included in (is a subset of) another fuzzy set, B ⊆ X:
µA(x) ≤ µB(x), ∀x∈X
Consider X = {1, 2, 3} and sets A and B
A = 0.3/1 + 0.5/2 + 1/3;
B = 0.5/1 + 0.55/2 + 1/3
then A is a subset of B, or A ⊆ B
Cardinality
•
Cardinality of a non-fuzzy set, Z, is the number of elements in Z. BUT
the cardinality of a fuzzy set A, the so-called SIGMA COUNT, is
expressed as a SUM of the values of the membership function of A,
µA(x):
cardA = µA(x1) + µA(x2) + … µA(xn) = ΣµA(xi),
Consider X = {1, 2, 3} and sets A and B
A = 0.3/1 + 0.5/2 + 1/3;
B = 0.5/1 + 0.55/2 + 1/3
cardA = 1.8
cardB = 2.05
for i=1..n
Empty Fuzzy Set
•
A fuzzy set A is empty, IF AND ONLY IF:
µA(x) = 0, ∀x∈X
Consider X = {1, 2, 3} and set A
A = 0/1 + 0/2 + 0/3
then A is empty
Fuzzy Rules
•
In 1973, Lotfi Zadeh published his second most influential paper. This
paper outlined a new approach to analysis of complex systems, in which
Zadeh suggested capturing human knowledge in fuzzy rules.
• A fuzzy rule can be defined as a conditional statement in the form:
IF
THEN
•
x
y
is A
is B
where x and y are linguistic variables; and A and B are linguistic values
determined by fuzzy sets on the universe of discourses X and Y,
respectively.
Classical Vs Fuzzy Rules
• A classical IF-THEN rule uses binary logic, for example,
Rule: 1
IF speed
is > 100
THEN stopping_distance is 100
Rule: 2
IF
speed is < 40
THEN stopping_distance is 20
• The variable speed can have any numerical value between 0 and 220 km/
h, but the linguistic variable stopping_distance can take either value long
or short. In other words, classical rules are expressed in the black-andwhite language of Boolean logic.
Classical Vs Fuzzy Rules
• We can also represent the stopping distance rules in a fuzzy form:
Rule: 1
IF
speed is fast
THEN stopping_distance is long
Rule: 2
IF
speed is slow
THEN stopping_distance is short
• In fuzzy rules, the linguistic variable speed also has the range (the
universe of discourse) between 0 and 220 km/h, but this range includes
fuzzy sets, such as slow, medium and fast. The universe of discourse of
the linguistic variable stopping_distance can be between 0 and 300 m and
may include such fuzzy sets as short, medium and long.
Classical Vs Fuzzy Rules
• Fuzzy rules relate fuzzy sets.
• In a fuzzy system, all rules fire to some extent, or in other words they
fire partially.
If the antecedent is true to some degree of
membership, then the consequent is also true to that same degree.
Firing Fuzzy Rules
• These fuzzy sets provide the basis for a weight estimation model. The
model is based on a relationship between a man’s height and his weight:
IF
height is tall
THEN weight is heavy
Degree of
Membership
1.0
0.8
Degree of
Membership
1.0
Heavy men
0.8
Tall men
0.6
0.6
0.4
0.4
0.2
0.2
0.0
160
0.0
180
190
200
Height, cm
70
80
100
120
Weight, kg
Firing Fuzzy Rules
•
The value of the output or a truth membership grade of the rule
consequent can be estimated directly from a corresponding truth
membership grade in the antecedent. This form of fuzzy inference uses a
method called monotonic selection.
Degree of
Membership
1.0
Degree of
Membership
1.0
Tall men
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
160
180
190
200
Height, cm
Heavy men
70
80
100
120
Weight, kg
Firing Fuzzy Rules
• A fuzzy rule can have multiple antecedents, for example:
IF
AND
AND
THEN
project_duration is long
project_staffing is large
project_funding is inadequate
risk is high
IF
service is excellent
OR
food is delicious
THEN tip is generous
• The consequent of a fuzzy rule can also include multiple parts, for
instance:
IF
temperature is hot
THEN hot_water is reduced;
cold_water is increased
Fuzzy Sets Example
• Air-conditioning involves the delivery of air which can be warmed or
cooled and have its humidity raised or lowered.
• An air-conditioner is an apparatus for controlling, especially lowering,
the temperature and humidity of an enclosed space. An air-conditioner
typically has a fan which blows/cools/circulates fresh air and has cooler
and the cooler is under thermostatic control. Generally, the amount of air
being compressed is proportional to the ambient temperature.
• Consider Johnny’s air-conditioner which has five control switches:
COLD, COOL, PLEASANT, WARM and HOT. The corresponding
speeds of the motor controlling the fan on the air-conditioner has the
graduations: MINIMAL, SLOW, MEDIUM, FAST and BLAST.
Fuzzy Sets Example
• The rules governing the air-conditioner are as follows:
RULE 1:
IF
RULE 2:
IF
RULE 3:
IF
RULE 4:
IF
RULE 5:
IF
TEMP is COLD
THEN SPEED is MINIMAL
TEMP is COOL
THEN SPEED is SLOW
TEMP is PLEASANT THEN SPEED is MEDIUM
TEMP is WARM
TEMP is HOT
THEN SPEED is FAST
THEN SPEED is BLAST
Fuzzy Sets Example
The temperature graduations are related
to Johnny’s perception of ambient
temperatures.
where:
Y : temp value belongs to the set
(0<µA(x)<1)
Y* : temp value is the ideal member to the
set (µA(x)=1)
N : temp value is not a member of the set
(µA(x)=0)
Temp (0C).
COLD
COOL
PLEASANT
WARM
HOT
0
Y*
N
N
N
N
5
Y
Y
N
N
N
10
N
Y
N
N
N
12.5
N
Y*
N
N
N
15
N
Y
N
N
N
17.5
N
N
Y*
N
N
20
N
N
N
Y
N
22.5
N
N
N
Y*
N
25
N
N
N
Y
N
27.5
N
N
N
N
Y
30
N
N
N
N
Y*
Fuzzy Sets Example
Johnny’s perception of the speed of the
motor is as follows:
where:
Y : temp value belongs to the set
(0<µA(x)<1)
Y* : temp value is the ideal member to the
set (µA(x)=1)
N : temp value is not a member of the set
(µA(x)=0)
Rev/sec
(RPM)
MINIMAL
SLOW
MEDIUM
FAST
BLAST
0
Y*
N
N
N
N
10
Y
N
N
N
N
20
Y
Y
N
N
N
30
N
Y*
N
N
N
40
N
Y
N
N
N
50
N
N
Y*
N
N
60
N
N
N
Y
N
70
N
N
N
Y*
N
80
N
N
N
Y
Y
90
N
N
N
N
Y
100
N
N
N
N
Y*
Fuzzy Sets Example
•
The analytically expressed membership for the reference fuzzy subsets
for the temperature are:
•
COLD:
for 0 ≤ t ≤ 10
µCOLD(t) = – t / 10 + 1
• SLOW:
for 0 ≤ t ≤ 12.5
for 12.5 ≤ t ≤ 17.5
µSLOW(t) = t / 12.5
µSLOW(t) = – t / 5 + 3.5
• etc… all based on the linear equation:
y = ax + b
Fuzzy Sets Example
Fuzzy Sets Example
•
The analytically expressed membership for the reference fuzzy subsets
for the temperature are:
•
MINIMAL:
for 0 ≤ v ≤ 30
µCOLD(t) = – v / 30 + 1
• SLOW:
for 10 ≤ v ≤ 30
for 30 ≤ v ≤ 50
µSLOW(t) = v / 20 – 0.5
µSLOW(t) = – v / 20 + 2.5
• etc… all based on the linear equation:
y = ax + b
Fuzzy Sets Example
Speed Fuzzy Sets
Truth Value
1
MINIMAL
SLOW
MEDIUM
FAST
BLAST
0.8
0.6
0.4
0.2
0
0
10
20
30
40
50
Speed
60
70
80
90 100
Exercises
For
A = {0.2/a, 0.4/b, 1/c, 0.8/d, 0/e}
B = {0/a, 0.9/b, 0.3/c, 0.2/d, 0.1/e}
Draw the Fuzzy Graph of A and B
Then, calculate the following:
- Support, Core, Cardinality, and Complement for A and B independently
- Union and Intersection of A and B
- the new set C, if C = A2
- the new set D, if D = 0.5´B
- the new set E, for an alpha cut at A0.5
Solutions
A = {0.2/a, 0.4/b, 1/c, 0.8/d, 0/e}
B = {0/a, 0.9/b, 0.3/c, 0.2/d, 0.1/e}
Support
Supp(A) = {a, b, c, d}
Supp(B) = {b, c, d, e}
Core
Core(A) = {c}
Core(B) = {}
Cardinality
Card(A) = 0.2 + 0.4 + 1 + 0.8 + 0 = 2.4
Card(B) = 0 + 0.9 + 0.3 + 0.2 + 0.1 = 1.5
Complement
Comp(A) = {0.8/a, 0.6/b, 0/c, 0.2/d, 1/e}
Comp(B) = {1/a, 0.1/b, 0.7/c, 0.8/d, 0.9/e}
Solutions
A = {0.2/a, 0.4/b, 1/c, 0.8/d, 0/e}
B = {0/a, 0.9/b, 0.3/c, 0.2/d, 0.1/e}
Union
AÈB = {0.2/a, 0.9/b, 1/c, 0.8/d, 0.1/e}
Intersection
AÇB = {0/a, 0.4/b, 0.3/c, 0.2/d, 0/e}
C=A2
C = {0.04/a, 0.16/b, 1/c, 0.64/d, 0/e}
D = 0.5´B
D = {0/a, 0.45/b, 0.15/c, 0.1/d, 0.05/e}
E = A0.5
E = {c, d}