ppt

CSE 473/573
Computer Vision and Image
Processing (CVIP)
Ifeoma Nwogu
Lecture 27 – Overview of probability concepts
1
Schedule
• Last class
– Object recognition and bag-of-words
• Today
– Overview of probability models
• Readings for today:
– Lecture notes
2
Random variables
• A random variable x denotes a quantity that is
uncertain
• May be result of experiment (flipping a coin) or a real
world measurements (measuring temperature)
• If observe several instances of x we get different
values
• Some values occur more than others and this
information is captured by a probability distribution
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
3
Discrete Random Variables
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
4
Continuous Random Variable
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
5
Joint Probability
• Consider two random variables x and y
• If we observe multiple paired instances, then some
combinations of outcomes are more likely than
others
• This is captured in the joint probability distribution
• Written as Pr(x,y)
• Can read Pr(x,y) as “probability of x and
– where x and y are scalars
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
6
Joint Probability (2)
• We can definitely have more than two random
variables and we would write Pr(x,y,z)
• We may also write Pr(x) - the joint probability of all
of the elements of the multidimensional variable
x = [x1,x2….xn] and if y = [y1,y2….yn]
then we can write Pr(x,y) to represent the joint
distribution of all of the elements from
multidimensional variables x and y.
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
7
Joint Probability (3)
2D Hinton diagram
Joint distributions of one discrete and one continuous variable
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
8
Marginalization
We can recover probability distribution of any variable in a joint distribution
by integrating (or summing) over the other variables
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
9
Marginalization
We can recover probability distribution of any variable in a joint distribution
by integrating (or summing) over the other variables
Scale and
Normalization
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
10
Marginalization
We can recover probability distribution of any variable in a joint distribution
by integrating (or summing) over the other variables
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
11
Marginalization
We can recover probability distribution of any variable in a joint distribution
by integrating (or summing) over the other variables
Works in higher dimensions as well – leaves joint distribution between
whatever variables are left
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
12
Conditional Probability
• Conditional probability of x given that y=y1 is relative
propensity of variable x to take different outcomes given that
y is fixed to be equal to y1.
• Written as Pr(x | y=y1)
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
13
Conditional Probability
• Conditional probability can be extracted from joint probability
• Extract appropriate slice and normalize
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
14
Conditional Probability
• More usually written in compact form
• Can be re-arranged to give
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
15
Conditional Probability
• This idea can be extended to more than two
variables
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
16
Bayes’ Rule
From before:
Combining:
Re-arranging:
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
17
Bayes’ Rule Terminology
Likelihood – propensity for
observing a certain value of
x given a certain value of y
Posterior – what we
know about y after
seeing x
Prior – what we know
about y before seeing x
Evidence –a constant to
ensure that the left hand
side is a valid distribution
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
18
Independence
• If two variables x and y are independent then variable x tells
us nothing about variable y (and vice-versa)
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
19
Independence
• If two variables x and y are independent then variable x tells
us nothing about variable y (and vice-versa)
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
20
Independence
• When variables are independent, the joint factorizes into a
product of the marginals:
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
21
Probabilistic graphical models
22
What is a graphical model ?
A graphical model is a way of representing probabilistic
relationships between random variables.
Variables are represented by nodes:
Conditional (in)dependencies are
represented by (missing) edges:
Undirected edges simply give
correlations between variables
(Markov Random Field or
Undirected Graphical model):
Directed edges give causality
relationships (Bayesian Network or
Directed Graphical Model):
Graphical Models
• e.g. Bayesian networks, Bayes nets, Belief
nets, Markov networks, HMMs, Dynamic
Bayes nets, etc.
• Themes:
– representation
– reasoning
– learning
• Original PGM reference book by Nir Friedman
and Daphne Koller.
Motivation for PGMs
• Let X1,…,Xp be discrete random variables
• Let P be a joint distribution over X1,…,Xp
If the variables are binary, then we need O(2p)
parameters to describe P
Can we do better?
• Key idea: use properties of independence
Motivation for PGMs (2)
• Two variables X and Y are independent if
– P(X = x|Y = y) = P(X = x) for all values x, y
– That is, learning the values of Y does not change prediction
of X
• If X and Y are independent then
– P(X,Y) = P(X|Y)P(Y) = P(X)P(Y)
• In general, if X1,…,Xp are independent, then
– P(X1,…,Xp)= P(X1)...P(Xp)
– Requires O(n) parameters
Motivation for PGMs (3)
• Unfortunately, most of random variables of
interest are not independent of each other
• A more suitable notion is that of conditional
independence
• Two variables X and Y are conditionally
independent given Z if
– P(X = x|Y = y,Z=z) = P(X = x|Z=z) for all values x,y,z
– That is, learning the values of Y does not change prediction of X once we know
the value of Z
In summary….
Probability theory provides
the glue whereby the parts are combined,
ensuring that the system as a whole is consistent, and providing
ways to interface models to data.
The graph theoretic side of graphical models provides both an
intuitively appealing interface by which humans can model highlyinteracting sets of variables as well as a data structure that lends
itself naturally to the design of efficient general-purpose
algorithms.
Many of the classical multivariate probabilistic systems studied
in fields such as
statistics, systems engineering, information theory, pattern
recognition and statistical mechanics are special cases of the general
graphical model formalism -- examples include mixture models, factor
analysis, hidden Markov models, and Kalman filters.
Slide Credits
• Simon Prince – Computer vision, models and
learning
29
Next class
• More on Graphical models
• Readings for next lecture:
– Lecture notes to be posted
• Readings for today:
– Lecture notes to be posted
30
Questions
31