Solution 7 - ETH Zürich

ETH Z¨
urich
Dept. Computer Science
Spring Semester 2015
Information theory
Exercise 7 (Solution)
06 May 2015
7.1
Jointly typical sequences
In this exercise you will compute the jointly typical set for a pair of random variables connected
by a binary symmetric channel, and the probability of error for jointly typical decoding for
such a channel. We consider a binary symmetric channel with crossover probability p = 0.1.
0
0
0.9
0.1
0.1
1
1
0.9
The input distribution that achieves capacity is the uniform distribution (i.e., p(x) = ( 21 , 12 )),
which yields the joint distribution p(x, y) for this channel is given by
X \Y
0
1
0
0.45
0.05
1
0.05
0.45
a) Calculate H(X), H(Y ), H(X, Y ), and I(X; Y ) for the joint distribution above.
H(X) = H(Y ) = 1,
H(X, Y ) = H(X) + H(Y |X)
= 1 + H(p)
= 1 − 0.9 log 0.9 − 0.1 log 0.1
= 1.469,
I(X; Y ) = H(Y ) − H(Y |X) = 0.531.
b) For the 2n possible input sequences of length n, which of them are typical (i.e., member
of An,ε (X) for ε = 0.2)? Which are the typical sequences in An,ε (Y )?
Solution:
In the case for the uniform distribution, every sequence has probability (1/2)n , and
therefore for every sequence (x1 , . . . , xn ), − n1 log p(x1 , . . . , xn ) = 1 = H(X), and therefore every sequence is typical, i.e., ∈ An,ε (X). Similarly, every sequence (y1 , . . . , yn ) is
typical, i.e., ∈ An,ε (Y ).
c) Recall that in class, the jointly typical set An,ε (X, Y ) is defined as the set of sequences
that satisfy the following equations:
h
i
p(x1 , . . . , xn ) ∈ 2−n(H(X)+ε) , 2−n(H(X)−ε)
h
i
p(y1 , . . . , yn ) ∈ 2−n(H(Y )+ε) , 2−n(H(Y )−ε)
h
i
p(x1 , . . . , xn , y1 , . . . , yn ) ∈ 2−n(H(X,Y )+ε) , 2−n(H(X,Y )−ε)
The first two equations correspond to the conditions that (x1 , . . . , xn ) and (y1 , . . . , yn )
are in An,ε (X) and An,ε (Y ), respectively. Consider the last condition, which can be
rewritten to state that − n1 log p({x1 , . . . , xn , y1 , . . . , yn }) ∈ [H(X, Y ) − ε, H(X, Y ) + ε].
Let k be the number of places in which the sequence (x1 , . . . , xn ) differs from (y1 , . . . , yn )
(k is a function of the two sequences). Then we can write
p(x1 , . . . , xn , y1 , . . . , yn ) =
n
Y
p(xi , yi )
i=1
= (0.45)n−k (0.05)k
n
1
=
(1 − p)n−k pk
2
An alternative way at looking at this probability is to look at the binary symmetric
channel as in additive channel Y = X ⊕ Z, where Z is a binary random variable that is
equal to 1 with probability p, and is independent of X. In this case,
p(x1 , . . . , xn , y1 , . . . , yn ) = p(x1 , . . . , xn )p(y1 , . . . , yn | x1 , . . . , xn )
= p(x1 , . . . , xn )p(z1 , . . . , zn | x1 , . . . , xn )
= p(x1 , . . . , xn )p(z1 , . . . , zn )
n
1
(1 − p)n−k pk
=
2
Show that the condition that (x1 , . . . , xn , y1 , . . . , yn ) being jointly typical is equivalent
to the condition that (x1 , . . . , xn ) is typical and (z1 , . . . , zn ) = (y1 − x1 , . . . , yn − xn ) is
typical.
Solution:
Out of the three conditions for joint typicality, the only condition that matters is the
last one. As argued above,
n
1
1
1
n−k k
(1 − p)
p
− log p(x1 , . . . , xn ) = − log
n
n
2
k
n−k
= 1 − log p −
log(1 − p)
n
n
Thus the pair (x1 , . . . , xn , y1 , . . . , yn ) is jointly typical iff |1 − nk log p − n−k
n log(1 − p) −
H(X, Y )| < ε, i.e., iff | − nk log p − n−k
log(1
−
p)
−
H(p)|
<
ε,
which
is exactly the
n
condition for (z1 , . . . , zn ) = (y1 − x1 , . . . , yn − xn ) to be typical. Thus the set of jointly
typical pairs (x1 , . . . , xn , y1 , . . . , yn ) is the set such that the number of places in which
(x1 , . . . , xn ) differs from (y1 , . . . , yn ) is close to np.
d) Calculate the size of An,ε (Z) for n = 25 and ε = 0.2. (Hint: you may refer to the
following table of the probabilities and numbers of sequences of with less than k ones)
n
j
Cumul. Pr.
− n1 log p(x1 , . . . , xn )
1
0.071790
0.152003
1
26
0.271206
0.278800
2
326
0.537094
0.405597
3
2626
0.763591
0.532394
4
15276
0.902006
0.659191
5
68406
0.966600
0.785988
6
245506
0.990523
0.912785
7
726206
0.997738
1.039582
8
1807781
0.999542
1.166379
9
..
.
3850756
..
.
0.999920
..
.
1.293176
..
.
k
P
0
j≤k
Solution:
The noise sequence is drawn i.i.d. according to the distribution (1p, p), thus we have
H(Z) = H(0.1) = 0.469.
Setting ε = 0.2, the typical set for Z is the set of sequences for which
− n1 log p(z1 , . . . , zn ) ∈ [H(Z) − ε, H(Z) + ε] = (0.269, 0.669). Looking at the table above
for n = 25,it follows that the typical Z sequences are those with 1,2,3 or 4 ones. (n).
The total probability of the set An,ε (Z) is, P (An,ε (Z)) = .902006−0.071790 = 0.830216.
The size of this set is 15276 − 1 = 15275
e) Now consider random coding for the channel, as in the proof of the channel coding
theorem. Assume that 2nR codewords
codeword 1
x1,1
...
x1,n
codeword 2
..
.
x2,1
...
x2,n
codeword 2nR
x2nR ,1
...
...
x2nR ,n
are chosen uniformly over the 2n possible binary sequences of length n. One of these
codewords is chosen and sent over the channel. The receiver looks at the received
sequence and tries to find a codeword in the code that is jointly typical with the received
sequence. As argued above, this corresponds to finding a codeword (xi,1 , . . . , xi,n ) such
that (y1 −xi,1 , . . . , yn −xi,n ) ∈ An,ε (Z). For a fixed codeword (xi,1 , . . . , xi,n ), what is the
probability that the received sequence (y1 , . . . , yn ) is such that (xi,1 , . . . , xi,n , y1 , . . . , yn )
is jointly typical? Compute such probability for n = 25 and ε = 0.2.
Solution:
The easiest way to calculate this probability is to view the BSC as an additive channel
Y = X ⊕ Z , where Z is Bernoulli(p). Then the probability that for a given codeword,
(xi,1 , . . . , xi,n ) , that the output (y1 , . . . , yn ) is jointly typical with it is equal to the
probability that the noise sequence (z1 , . . . , zn ) is typical, i.e., in An,ε (Z).
The noise sequence is drawn i.i.d. according to the distribution (1 − p, p), and as
calculated above, the probability that the sequence is typical P (An,ε (Z)) = 0.830216.
Therefore the probability that the received sequence is not jointly typical with the
transmitted codeword is 0.169784.
f ) Suppose now that a fixed codeword (x1,1 , . . . , x1,n ) was sent and (y1 , . . . , yn ) was received. Denote the event
Ej = {(xj,1 , . . . , xj,n , y1 , . . . , yn ) ∈ An,ε (X, Y )}, j ∈ {2, 3, . . . , 2nR }
to be the event that there is a codeword other than (x1,1 , . . . , x1,n ) jointly typical with
(y1 , . . . , yn ). Let n = 25, R = 0.36, ε = 0.2. Use the union bound to find an upper
bound to
Pr(E2 ∪ E3 ∪ · · · ∪ E2nR | (x1,1 , . . . , x1,n ) was sent).
Solution:
The probability that a randomly
chosen (xi,1 , . . . , xi,n ) is typical with the given
1 n
(y1 , . . . , yn ) is |An,ε (Z)| ∗ 2 = 15275 ∗ 2−25 = 4.552 × 10−4 .
Using the simple union of events bound gives the probability of
being
codeword
1 n
nR
jointly typical with the received sequence to be |An,ε (Z)| ∗ 2 ∗ (2
− 1) =
4.552 × 10−4 × 511 = 0.23262.
Note that instead of the union bound, one can also compute the probability
more accurately as follows:
We know that each of the other codewords is jointly typical with received sequence with
probability 4.552 × 10−4 , and each of these codewords is independent. The probability
that none of the 511 codewords are jointly typical with the received sequence is
therefore (1 − 4.552 × 10−4 )511 = 0.79241, and the probability that at least one of them
is jointly typical with the received sequence is therefore 1 − 0.79241 = 0.20749.
g) Given that a particular codeword was sent, the probability of error (averaged over the
probability distribution of the channel and over the random choice of other codewords)
can be written as
X
Pr(Error | codeword 1 sent) =
p(y1 , . . . , yn | x1,1 , . . . , x1,n )
{y1 ,...,yn } causes error
There are two kinds of error: the first occurs if the received sequence {y1 , . . . , yn } is
not jointly typical with the transmitted codeword, and the second occurs if there is
another codeword jointly typical with the received sequence. Using the result of the
previous parts, calculate this probability of error. By the symmetry of the random
coding argument, this does not depend on which codeword was sent.
Solution:
There are two error events, which are conditionally independent, given the received
sequence. In the previous part, we showed that the conditional probability of error of
the second kind was 0.20749, irrespective of the received sequence (y1 , . . . , yn ).
The probability of error of the first kind is 0.1698, conditioned on the input codeword.
In part (e), we calculated the probability that (y1 − xi,1 , . . . , yn − xi,n ) ∈
/ An,ε (Z),
but this was conditioned on a particular input sequence. Now by the symmetry and
uniformity of the random code construction, this probability does not depend on
(xi,1 , . . . , xi,n ),and therefore the probability that (y1 − xi,1 , . . . , yn − xi,n ) ∈
/ An,ε (Z) is
also equalto this probability, i.e., to 0.1698.
We can therefore use a simple union of events bound to bound the total probability of
error ≤ 0.1698 + 0.2075 = 0.3773. Thus we can send 512 codewords of length 25 over a
BSC with crossover probability 0.1 with probability of error less than 0.3773.
A little more accurate calculation can be made of the probability of error using
the fact that conditioned on the received sequence, both kinds of error are independent.
Using the symmetry of the code construction process, the probability of error of
the first kind conditioned on the received sequence does not depend on the received
sequence, and is therefore 0.1698. Therefore the probability that neither type of error
occurs is (using their independence) = (1 − 0.1698)(1 − 0.2075) = 0.6579 and therefore,
the probability of error is 1 − 0.6579 = 0.3421
7.2
Symmetric Channels
Consider a discrete memoryless channel with input X ∈ {1, 2, . . . , m} and output Y ∈
{1, 2, . . . , n}. Let the channel transition probabilities be given by a matrix P where the
entry in the xth row and y td column denotes the conditional probability Pr(Y = y | X = x).
A channel is said to be symmetric if the rows of the matrix P are permutations of each other
and the columns are permutations of each other. A channel is said to be weakly symmetric
if
P
the rows of the matrix P are permutations of each other and all the column sums x p(y | x)
are equal. Show that, for a weakly symmetric channel, the capacity C is
C = log n − H(row of transition matrix)
and this is achieved by a uniform distribution on the input alphabet. [Hint: Let r be a row
of the transition matrix. Show that I(X; Y ) ≤ log n − H(r). Use the condition for equality.]
Solution:
Let r be a row of the transition matrix. Then
I(X; Y ) = H(Y ) − H(Y |X)
= H(Y ) − H(r)
≤ log n − H(r),
where the last step follows from the fact that the entropy of a discrete random variable is
always less than the logarithm of the alphabet size of the random variable. The condition
for equality is when the distribution of Y is uniform on its alphabet. If there exists a input
distribution that makes the distribution of Y uniform on its alphabet, it will be the capacityachieving distribution and the RHS in the above inequality will be the capacity. Let us check
if that is true for weakly symmetric channels with uniform distribution on the input alphabet:
m
X
m
1 X
c
Pr(Y = y) =
Pr(Y = y | X = x) Pr(X = x) =
Pr(Y = y | X = x) = ,
| {z } m
m
i=1
i=1
=1/m
{z
}
|
:=c
where the second equality arises because the input is uniformly distributed, and the third
equality is due to the property of weakly symmetric channels (sum of rows of the transition
matrix are the same).