1. Elements of Probability 1.1. Sample Space and Events

1. Elements of Probability
1.1. Sample Space and Events
Consider an experiment whose outcome is not known
• Sample space S: the set of all possible outcome
Flip a coin: S = {H, T}
Rolling a die: S = {1, 2, 3, 4, 5, 6}
Running a race among 7 horses numbered 1 thru 7:
S ={ all ordering of (1,2,3,4,5,6,7)}
• Event: any subset A of the sample space is known as an
event
Event which getting a H: A={H}
Event to have a even number when rolling a die
A={2,4,6}
Event that the number 5 horse comes first
A ={ all outcomes in S starting with 5}
• For any two events, we define the new event A∪B, the union
of A and B, to consists of all outcomes that in A or B or in
both A and B
• we can also define the intersection of A and B, AB, to consist
of all outcomes that are in both A and B
• For any event A we define the event Ac, referred to as the
complement of A, to consist of all outcomes in the sample
space S that are not in A
• note that S c does not contain any outcomes and thus cannot
occur. We call S c the null set and designate it by φ
• If AB = φ, we say that A and B are mutually exclusive.
1.1
1.2. Axioms of Probability
Axiom 1 0 ≤ P (A) ≤ 1
Axiom 2 P (S) = 1
Axiom 3 For any sequence of mutually exclusive events
A1, A2, . . .,

p
n
[

i=1

Ai =
n
X

i=1
P (Ai), n = 1, 2, . . . , ∞
1.3. Usual definition
Suppose that an experiment, whose sample space is S, is repeatedly performed under exactly same conditions.
For each event A of the sample space S, we define n(A) to be
the number of times in the first n repetitions of the experiment
that the event A occurs. Then the probability of the event A is
defined by
n(A)
P (A) = n→∞
lim
n
For some experimenters it is natural to assume that all outcomes
in the sample space are equally likely to occur, i.e., consider an
experiment whose sample space is a finite set S is a finite set,
say S = {1, 2, . . . , N }
Then it is often natural to assume that
P ({1}) = P ({2}) = · · · = P ({N })
which implies from Axioms 2 and 3 that
1
P ({i}) = , i = 1, 2, . . . , N.
N
From this, it follows from Axiom 3 that for any event E
number of points in E
P (E) =
number of points in S
1.2
1.4. Some simple propositions
• P (Ac) = 1 − P (A)
• If E ⊂ F , then P (E) ≤ P (F )
• P (E ∪ F ) = P (E) + P (F ) − P (EF )
•
P (E1 ∪ E2 ∪ · · · ∪ Un) =
n
X
i=1
P (Ei) −
X
i1 <i2
+(−1)r+1
P (EiEj ) + · · ·
X
i1 <i2 <···<ir
n+1
+ · · · + (−1)
P (Ei1 Ei2 · · · Eir )
P (E1E2 · · · En)
where
i1 <i2 <···<ir P (Ei1 Ei2 · · · Eir ) is taken over all of the


 n 

 possible subsets of size r of the set {1, 2 . . . , n}.
r
P
1.3
1.5. Conditional Probability and Independence
Consider the previous example, suppose we are interested in the
probability that two heads are obtained provided that head is
landed on the first flip
In this case we call it as conditional probability that A occurs
given B has occurred and denote by P {A|B}
number of elements with A and B
number of elements with B
1
=
2
P (A|B) =
Now the conditional probability can be computed as
Pr{A|B} =
Pr{AB}
Pr{B}
Example: An urn contains 10 white, 5 yellow, and 10 black marbles. A marble is chosen at random from the urn, and it is noted
that it is not one of the black marbles. What is the probability
that it is yellow?
Note that P (·|F ) is a probability
1.4
Total Probability Theorem
If S =
Sn
i=1 Bi
and Bi ∩ Bj = φ, i 6= j, then
Pr{A} =
n
X
i=1
Pr{A|Bi}Pr{Bi}
Example: The credit department of a bank studied the profile
of their tax loan clients. There are three types of clients: 90% of
them are excellent, 9% of them are good and the remaining are
so-so. The default probability of the client is 0.001% if it is an
excellent customer; 0.01% if it is good and 1% if it is so-so. What
is the overall default probability of the clients of this bank?
Bayesian Theorem:
If S =
Sn
i=1 Bi
and Bi ∩ Bj = φ, i 6= j, then
Pr{Bi|A} =
Pr{A|Bi}Pr{Bi}
, i = 1, 2, . . . , n
Pn
Pr{A|B
}Pr{B
}
i
i
i=1
Given the client defaulted the loan, what is the prob. that it is
an excellent customer; a good customer; a so-so customer?
1.5
Independence
Events A and B are independent if
Pr{A|B} = Pr{A}
or equivalently,
Pr{AB} = Pr{A}Pr{B}
Example: A card is selected at random from an ordinary deck of
52 playing cards. If E is the event that the selected card is an ace
and F is the event that it is a spade. Are E and F independent?
If E and F are independent, then so are E and F C .
The three events E, F and G are said to be independent if
P (EF G)
P (EF )
P (EG)
P (F G)
=
=
=
=
P (E)P (F )P (G)
P (E)P (F )
P (E)P (G)
P (F )P (G)
The events E1 and E2 are conditionally independent given F
if
P (E1E2|F ) = P (E1|F )P (E2|F )
1.6
Example: An individual trial by a 3-judge panel is declared guilty
if at least 2 judges cast votes of guilty. Suppose that when
the defendant is, in fact, guilty, each judge will independently
vote guilty with probability .7, whereas when the defendant is,
in fact, innocent, the probability drops to .2. If 70 percent of
defendants are guilty, compute the conditional probability that
judge number 3 votes guilty given that:
(a) judges 1 and 2 vote guilty;
(b) judges 1 and 2 cast 1 guilty and 1 innocent vote;
(c) judges 1 and 2 both cast innocent votes.
Let Ei, i = 1, 2, 3 denote the event that judge i casts a guilty
vote. Are these events independent? Are they conditionally
independent?
1.7
1.6. Random Variable
A Random variable is a numerical description of the outcome of
some experiments
Flipping a coin: let X = 1 if H is faced up, 0 otherwise
Number of heads faced up in 10 flips of a coin
Rolling a die and let X be the number faced up
Pick a student in the class and let X be the height of the student
In a queue, number of customers arrived in one hour
The waiting time of a customer waited in the queue
It can be discrete, continuous or mixed of two:
A discrete random variable can have only a countable number of
outcomes
A continuous variable can take uncountable number of outcomes
1.8
To describe the behavior of a random variable we can use
(a) Probability Distribution
F (x) = Pr(X ≤ x)
Properties:
i. F is a nondecreasing function; that is, if a < b, then
F (a) ≤ F (b)
ii. limx→−∞ F (x) = 0
iii. limx→∞ F (x) = 1
iv. F is right continuous. That is, for any b and any decreasing sequence bn, n ≥ 1, that converges to b, limn→∞ F (bn) =
F (b).
1.9
(b) Probability mass function or Probability density function
If X is discrete, then the probability mass function f (x) is
f (x) = Pr(X = x)
Suppose X can takes values x1, x2, . . . , xn, . . ., then
∞
X
i=1
f (xi) = 1
and
F (a) =
f (xi)
X
i|xi ≤a
Suppose If X is continuous, then the probability density
function is the one f such that
Pr(X ∈ C) =
Z
C
f (x)dx
Because of Axiom 2,
Z
∞
−∞
f (x)dx = 1
The relationship between the cumulative distribution F (·)
and the probability density function f (·) is expressed by
F (a) = Pr{X ∈ (−∞, a]} =
Z
a
−∞
f (x)dx
Differentiating both sides yields
d
F (a) = f (a)
da
An intuitive interpretation:
(
)
Pr a − ≤ X ≤ a +
≈ f (a)
2
2
when is small
1.10
Examples: Let X be number of heads faced up in 5 flips of
a coin, then
P r(X = i) =
Let f (x) = e−x, x > 0; 0, otherwise, then f (x) is a density function.
1.11
The distribution of function of random variable
Suppose a random variable X has density function fX (x)
and cdf FX (x). Now let Y = w(X) where w(cdot) is a
continuos and either increasing or decreasing for a < x < b.
Suppose also that a < x < b if and only if α < Y < β,
and let X = w−1(Y ) be the inverse function for α < y < β.
Then the cdf of Y is
FY (y) = FX (w−1(y)), α < y < β
and the density function of Y is
dx fY (y) = fX (w (y))
,α < y < β
dy −1
1.12
Sometimes we are not only interested in an individual variable but two or more variables.
To specify the relationship between two random variables, we
define the joint cumulative distribution function of X and Y
by
F (x, y) = Pr{X ≤ x, Y ≤ y}
The distribution of X can be obtained from the joint distribution
FX (x) = Pr{X ≤ x}
= Pr{X ≤ x, Y < ∞}
= Pr{y→∞
lim X ≤ x, Y ≤ y}
= y→∞
lim Pr{X ≤ x, Y ≤ y}
= F (x, ∞)
Similarly we can obtain the distribution of Y
All joint probability statement about X and Y can be answewred by F (x, y)
Example: Pr{X > a, Y > b}
1.13
If X and Y are both discrete, the we can define the joint
probability mass function by
f (x, y) = Pr{X = x, Y = y}
Now suppose X takes values x1, x2, . . . , xn, Y takes values
y1, y2, . . . , ym, The joint prob. mass function can be easily
expressed in tabular form.
Example: Consider an experiment: flip a fair coin and toss
a die independently, let X = 1 if head comes up in the coin
flip, 0, otherwise. Let Y be the number faced up in the toss
of die. Then the joint probability mass function of X and Y
is given by
Y
X
0
1
Pr{Y = j}
1
1
1
2 × 6
1
1
×
2
6
2
3
4
5
6 Pr{X = i}
The probability mass function can be obtained by
Pr(X = xi) = f (xi) =
1.14
m
X
j=1
f (xi, yj ) why?
Similarly, if X and Y are both continuous, then the joint
probability density function f (x, y) is the one such that
Pr{X ∈ C, Y ∈ D} =
Z Z
{(x,y):x∈C;y∈D}
f (x, y)dxdy
Since
F (a, b) = Pr{X ∈ (−∞, a), Y ∈ (−∞, b)}
=
Z
a
Z
b
f ∗ x, y)dydx
−∞ −∞
Therefore,
∂2
F (a, b)
f (a, b) =
∂x∂y
whenever the derivative exists.
if X and Y are jointly continuous, they are individually continuous and the prob. density function of X is
f (x) =
Z
∞
−∞
f (x, y)dy
and the density function of Y is
f (y) =
Z
1.15
∞
−∞
f (x, y)dx
Conditional Distribution: Discrete Case:
Recall the definition of conditional probability of E given F
Pr(E|F ) =
If X and Y are discrete random variable , define the conditional probability mass function of X given Y = y by
f (x|y) = Pr(X = x|Y = y) =
Pr(X = x, Y = y) f (x, y)
=
Pr(Y = y)
f (y)
for all values fo y such that f (y) > 0.
Similarly, define conditional probability function of X given
Y = y by
FX|Y (x|y) = Pr(X ≤ x|Y = y) =
X
f (a|y)
a≤x
where f (y) > 0.
Example:
Suppose that f (x, y), the joint probability mass function of
X and Y , is given by
f (0, 0) = 0.45 f (0, 1) = 0.05 f (1, 0) = 0.05 f (1, 1) = 0.45
Find the marginal distribution of X and the conditional distribution of X given Y = 0, 1.
1.16
Continuous Case:
If X and Y have the joint probability density function f (x, y),
define the conditional density function of X given Y = y by
f (x|y) =
f (x, y)
f (y)
It is consistent with the discrete case.
f (x, y)dxdy
f (y)dy
Pr(x ≤ X < x + dx, y ≤ Y < y + dy)
=
Pr(y ≤ Y < y + dy)
= P Pr(x ≤ X < x + dx|y ≤ Y < y + dy)
f (x|y)dx =
The conditional cumulative distribution function of X given
Y = y is
FX|Y (a|y) = Pr(X ≤ a|Y = y) =
Z
Two random variables are independent if
f (x, y) = f (x)f (y)
1.17
a
−∞
f (x|y)dx
Examples: Suppose X, Y have a joint density function defined as
f (x, y) = 1, 0 < x < 1; 0 < y < 1.
We can compute the marginal density function of X by integrating Y out:
fX (x) =
=
Z
1
0
Z 1
0
f (x, y)dy
1dy
= y|1y=0
= 1, 0 < x < 1
Similarly, the marginal density of Y is also
fY (y) = 1, 0 < y < 1
Since f (x, y) = fX (x)fY (y), X and Y is independent.
Now suppose we want to find Pr(X > Y ).
Note that X > Y represents {(x, y) : 0 < y < x < 1}.
Therefore,
Pr(X > Y ) =
=
=
=
Z Z
{(x,y):0<y<x<1}
Z 1Z x
1dydx
0 0
1 x
y|y=0dx
0
Z 1
Z
0
xdx
x1
|
20
1
=
2
=
1.18
f (x, y)dxdy
Let X and Y have the joint pdf
fX,Y (x, y) = 2e−(x+y), 0 < x < y < ∞
Then the marginal density of X is
fX (x) =
=
Z
∞
Zx∞
x
fX,Y (x, y)dy
2e−(x+y)dy
∞ −y
e dy
x
−x
−y ∞
2e −e |x
−x −x
= 2e
−x
Z
=
= 2e e
= 2e−2x, 0 < x < ∞
the marginal density of Y is
fY (y) =
=
Z
y
fX,Y (x, y)dx
Z0y
2e−(x+y)dx
0
y −x
e dx
0
−y
−x y
2e −e |0
−y
−y
−y
= 2e
Z
=
= 2e (1 − e )
= 2e−y (1 − e−y ), 0 < y < ∞
The conditional density of X given Y = y is
fX,Y (x, y)
fX|Y (x|y) =
fY (y)
2e−(x+y)
=
2e−y (1 − e−y )
e−x
,0 < x < y
=
1 − e−y
Similarly, the conditional density of Y given X = x is
e−y
fY |X (y|x) = −x , x < y < ∞
e
1.19
We can also compute the conditional mean of X given Y = y
Z
y
xfX|Y (x|y)dx
Z y xe−x
= 0
dx
1 − e−y
Z y
1
−x
=
xe
dx
1 − e−y 0
E(X|Y = y) =
0
Integration by parts by letting u = x and dv = e−xdx,
du = dx and v = −e−x, then
Z
y
0
−x
[−xe−x|y0 ]
−y
y
xe dx =
+ 0 e−xdx
= −ye + [−e−x|y0 ]
= 1 − e−y − ye−y
Z
Therefore,
1 − e−y − ye−y
E(X|Y = y) =
1 − e−y
1.20
(c) Expected Values and Variance
• If X is discrete and taking values x1, x2, . . . , then the
expectation or expected value, or the mean of X is defined
by
X
E(X) = xiPr{X = xi}
i
Examples:
If the probability mass function of X is given by
1
= f (1)
2
f (0) =
then
E(X) =
If I is the indicator variable for the event A, that is, if



I=

1 if A occurs
0 otherwise
Then
Therefore, the expectation of the indicator variable for
the event is just the probability that A occurs
• If X is continuous with the pdf f (x), then the expected
value of X is
E(X) =
Z
∞
−∞
xf (x)dx
Example: If X has the pdf



3x2 if 0 < x < 1
f (x) = 
 0
otherwise
Then the expected value of X is
1.21
• Sometimes we are not interested in the expectation of
X but the expectation of a function g(X), we need the
following results
If X is discrete with pmf f (xi), then
E(g(X)) =
X
i
g(xi)f (xi)
and if X is continuous with pdf f (x), then
E(g(X)) =
Z
∞
−∞
g(x)f (x)dx
If a and b are constants, then
E(aX + b) = aE(X) + b
If X1 and X2 are two random variables, then
E(X1 + X2) = E(X1) + E(X2)
1.22
• Variance
To measure the variation of values in the distribution, we
use Variance
If X is a random variable with mean µ, then the variance
of X is defined by
Var(X) = E[(X − µ)2]
Alternative formula
Var(X) = E(X 2) − µ2
For any constant a and b
Var(aX + b) = a2Var(X)
1.23
• If we have two random variables X1 and X2 and to measure the dependence structure, we can use the Covariance
Cov(X1, X2) = E[(X1 − µ1)(X2 − µ2)]
where µi = E(Xi), i = 1, 2.
Alternative formula:
Cov(X1, X2) = E(X1X2) − µ1µ2
Var(X1 + X2) = Var(X1) + Var(X2) + 2Cov(X1, X2)
If X1 and X2 are independent, then
Cov(X1, X2) = 0
Correlation Coefficient: Measure of linear relationship between two random variables
ρ = Corr(X, Y ) =
−1 ≤ ρ ≤ 1.
1.24
Cov(X, Y )
r
r
Var(X) Var(Y )
(d) Some Inequalities and Laws of Large Numbers
Markov’s Inequality
If X takes on only nonnegative values, then for any value
a>0
E[X]
Pr(X ≥ a) ≤
a
Corollary: Chebyshev’s Inequality: If X is a random variable
having mean µ and variance σ 2, then for any value k > 0,
Pr(|X − µ| ≥ kσ) ≤
1
k2
Corollary: One sided Chebyshev’s Inequality:
If X is a random variable having mean 0 and variance σ 2,
then for any value k > 0,
σ2
Pr(X > a) ≤ 2
σ + a2
The Weak Law of Large Numbers
Let X1, X2, . . . be a sequence of independent and identically
distributed random variables having mean µ. The for any
> 0,

X1 + · · · + Xn
− µ >  → 0 as n → ∞
Pr
n




A generalization: Strong Law of Large Numbers:
With prob. 1,
lim
n→∞
X1 + · · · + Xn
=µ
n
1.25
(e) Some Discrete Random Variables
• Bernoulli Random Variable
Only two possible outcomes : Success or Failure
p = Pr{success}
The probability mass function of X is
Pr{X = x} = px(1 − p)1−x, x = 0, 1
• Binomial Random Variable
If perform the Bernoulli trials n times and count the number of successes. Then we have a binomial variable
The probability mass function of a binomial variable is


n x
n−x

 p (1 − p)
Pr{X = x} = 
, x = 0, 1, . . . , n
x
1.26
• Poisson Random Variable
If n is very large and np → λ, a constant, in the binomial
setup we have the Poisson variable
Example: number of customers visit the shop
The probability mass function is
e−λλx
Pr{X = x} =
, x = 0, 1, 2, . . . .
x!
E(X) = λ and Var(X) = λ
• Geometric Random Variable
If X is the number of first trial that is a success in a
sequence of Bernoulli trials, then X is geometric random
variable
The p.m.f. is
Pr{X = n} = p(1 − p)n−1, n ≥ 1
E(X) =
(1 − p)
1
and Var(X) =
p
p2
1.27
• Negative Binomial Random Variable
If X is the number of the rth success trial in a sequence
of Bernoulli trials, then X is negative binomial random
variable
The p.m.f. is


n−1 
n−r r

 (1 − p)
Pr{X = n} = 
p ,n ≥ r
r−1
Note that the N.B. r.v. can be thought as sum of r geometric random variables, therefore,
r
r(1 − p)
E(X) =
and Var(X) =
p
p2
• Hypergeometric Random Variable
Consider an urn containing N + M balls
N of them are light coloured and M are dark coloured
a sample of size n is randomly chosen
X is the number of light colored balls selected
Then X is hypergeometric variable with p.m.f.

Pr{X = i} =



 M


n−i


 N +M 


n
 N

i
nN
nN M 
n−1 
E(X) =
and Var(X) =
1
−
N +M
(N + M )2
N +M −1
• Discrete Uniform random Variable Let X be a number
picked out from {1, 2, . . . , m} then it is called discrete
uniform variable over (1, m) with p.m.f.
1
Pr{X = i} = , i = 1, 2, . . . , m.
m

1.28

(f) Some Continuous Random Variables
• Uniform Random Variable
X ∼ U (a, b) if its density function is given by



f (x) = 

1
b−a
a<x<b
otherwise
0
The distribution function of X is given, for a < x < b,
by
F (x) = Pr{X ≤ x} =
a+b
E(X) =
2
1.29
Z
x
a
(b − a)−1dx =
x−a
b−a
(b − a)2
and Var(X) =
12
• Exponential Random Variable
A random variable X is exponentially distributed if its
density function is
f (x) = λe−λx, x > 0
Its cumulative function is
F (x) =
Z
x
0
λe−λy dy
Memoryless Property
Pr{X > s + t|X > t} = Pr{X > s}
It is always used for modeling the inter-arrival of a customer in queueing theory
1.30
• Normal Random Variable
Density function:
f (x) = √
1
2
2
e−(x−µ) /2σ , −∞ < x < ∞
2πσ
The cumulative function is
1.31
• Log-Normal Random Variable
A random variable X is said to be log-normal distributed
if Y = log(X) is normal distributed random variable
The density function of X is given by




√1
x 2πσ 2

0
f (x) = 

exp
x−µ)2
− (ln 2σ
2
!
x>0
otherwise
It is used for modeling the price of stock
Very useful in finance
1.32
• Weibull Random Variable
A random variable X is Weibull distributed if its density
function is of the form
 
x−L β−1

 β
f (x) = 


α
α
0
"
exp −
x−L β
α
#
x>L
otherwise
Useful in life and fatigue tests, equipment lifetime
L is so called the guarantee parameter
if L = 0 and β = 1, it is exponential distribution
1.33
• Gamma Random Variable
The density function
f (x) =
1
λ
k−1
e−x/λ
,x > 0
(k − 1)!
x
λ
It can be shown that this is the sum of k, k integer,
independent exponential random variables with mean λ1
Very useful in queueing, insurance risk theory and inventory control
• Beta Random Variable
The density function
(α + β − 1)!
x !α−1
x !β−1
f (x) =
1−
,0 < x < s
(α − 1)!(β − 1)! s
s
Very useful to model the variation over a fixed interval
from 0 to a positive constant s
• Mixture distribution
A random variable Y is a k − point mixture of the random variables X1, X2, . . . , Xk if its cdf is given by
FY (y) = a1FX1 (y) + a2FX2 (y) + · · · + ak FXk (y)
where all ai > 0 and a1 + a2 + · · · + ak = 1.
Note: ai is the mixing proportion.
1.34
• Random Walk
– Defn: Let Zt, t = 0, 1, 2, . . . be a sequence of random
variables such that
i. Z0 = 0
ii. Zt = X1 + X2 + · · · + Xt, t = 1, 2, . . .
item Pr(Xi = 1) = 21 , Pr(Xi = −1) = 12
iii. X1, X2, · · · , Xt, · · · independent
Then {Zt} a symmetric Random Walk
– The path
6
@
@
R
@
@
@
R
@
@
@
@
@
R
@
R
@
@
@
@
@
R
@
R
@
@
@
@
@
@
@
R
@
R
@
R
@
@
@
@
@
R
@
R
@
@
@
@
@
R
@
R
@
1.35
-
– Theorem
i. E(Zt) = 0
ii. Var(Zt) = t
iii. If t > s, then Zs and Zt−s are independent
iv.

 


t

 1 t





t + k even and − t ≤ k ≤ t

t+k  2


2
Pr(Zt = k) = 








t + k odd and |k| > t
0
– Proof:
i.
E(Zt) = E(X1) + · · · + E(Xt) = 0 + 0 + · · · + 0 = 0
ii.
Var(Zt) = Var(X1) + Var(X2) + · · · + Var(Xt)
= |1 + 1 +{z· · · + 1} = t
t terms
iii. Zs = X1 +X2 +· · ·+Xs and Zt−s = Xs+1 +· · ·+Xt
are independent.
iv. Suppose there are l up and m down within t steps,
then l + m = t, l − m = k and
t+k
l=
2
Therefore



1
Pr(Zt = k) = Pr B t, 
2

 
t+k t−k



2
 t  1
1 2





t + k even and

t+k

2
2

2
= 


−t ≤ k ≤ t





 0
t + k odd and |k| > t
1.36
– Examples: Find
i. Pr(Z3 = 1 ∩ Z8 = 4)
ii. Pr(Z8 = 4|Z3 = 1)
iii. E(Zt2)
iv. E(Z3Z8)
v. Cov(Z3, Z8)
vi. E(ZT2 |Zt = k) given T > t
vii. Pr(τ3 = 5) where τk = inf{t|Zt = k}
1.37
– Solution:
i.
Pr(Z3 = 1 ∩ Z8 = 4) = Pr(Z3 = 1 ∩ Z8 − Z3 = 4 − 1)
= Pr(Z3 = 1)Pr(Z8 − Z3 = 3)
= Pr(Z3 = 1)Pr(Z˜5 = 3)

  
 
3
5
 3  1  5  1


=  
2
4
2
2
where Z8 − Z3 form a new random walk Z˜8−3
ii.
Pr(Z8 = 4 ∩ Z3 = 1)
Pr(Z3 = 1)
Pr(Z8 − Z3 = 3 ∩ Z3 = 1)
=
Pr(Z3 = 1)
Pr(Z8 − Z3 = 3)Pr(Z3 = 1)
=
Pr(Z3 = 1)

 
5
1
5


= Pr(Z˜5 = 3) =    
4
2
Pr(Z8 = 4|Z3 = 1) =
iii.
E(Zt2) = Var(Zt) + (E(Zt))2 = t + 0 = t
iv.
E(Z3Z8) =
=
=
=
E((Z8 − Z3 + Z3)Z3)
E((Z8 − Z3)Z3 + Z32))
E(Z8 − Z3)E(Z3) + E(Z32)
0×0+3=3
v.
Cov(Z3, Z8) = E(Z3Z8) − E(Z3)E(Z8)
3−0×0=3
1.38
or
Cov(Z3, Z8) =
=
=
=
Cov(Z3, Z3 + Z8 − Z3)
Cov(Z3, Z − 3) + Cov(Z3, Z8 − Z3)
Var(Z3) + 0
3
vi.
E(ZT2 |Zt = k) =
=
=
=
=
E((k + ZT − Zt)2|Zt = k)
E((k + ZT − Zt)2)
E(k 2 + (ZT − Zt)2 + 2k(ZT − Zt))
k 2 + E(Z˜T2 −t) + 2kE(Z˜T −t)
k2 + T − t
1.39