ETH Z¨ urich Dept. Computer Science Spring Semester 2015 Information theory Exercise 7 (Solution) 06 May 2015 7.1 Jointly typical sequences In this exercise you will compute the jointly typical set for a pair of random variables connected by a binary symmetric channel, and the probability of error for jointly typical decoding for such a channel. We consider a binary symmetric channel with crossover probability p = 0.1. 0 0 0.9 0.1 0.1 1 1 0.9 The input distribution that achieves capacity is the uniform distribution (i.e., p(x) = ( 21 , 12 )), which yields the joint distribution p(x, y) for this channel is given by X \Y 0 1 0 0.45 0.05 1 0.05 0.45 a) Calculate H(X), H(Y ), H(X, Y ), and I(X; Y ) for the joint distribution above. H(X) = H(Y ) = 1, H(X, Y ) = H(X) + H(Y |X) = 1 + H(p) = 1 − 0.9 log 0.9 − 0.1 log 0.1 = 1.469, I(X; Y ) = H(Y ) − H(Y |X) = 0.531. b) For the 2n possible input sequences of length n, which of them are typical (i.e., member of An,ε (X) for ε = 0.2)? Which are the typical sequences in An,ε (Y )? Solution: In the case for the uniform distribution, every sequence has probability (1/2)n , and therefore for every sequence (x1 , . . . , xn ), − n1 log p(x1 , . . . , xn ) = 1 = H(X), and therefore every sequence is typical, i.e., ∈ An,ε (X). Similarly, every sequence (y1 , . . . , yn ) is typical, i.e., ∈ An,ε (Y ). c) Recall that in class, the jointly typical set An,ε (X, Y ) is defined as the set of sequences that satisfy the following equations: h i p(x1 , . . . , xn ) ∈ 2−n(H(X)+ε) , 2−n(H(X)−ε) h i p(y1 , . . . , yn ) ∈ 2−n(H(Y )+ε) , 2−n(H(Y )−ε) h i p(x1 , . . . , xn , y1 , . . . , yn ) ∈ 2−n(H(X,Y )+ε) , 2−n(H(X,Y )−ε) The first two equations correspond to the conditions that (x1 , . . . , xn ) and (y1 , . . . , yn ) are in An,ε (X) and An,ε (Y ), respectively. Consider the last condition, which can be rewritten to state that − n1 log p({x1 , . . . , xn , y1 , . . . , yn }) ∈ [H(X, Y ) − ε, H(X, Y ) + ε]. Let k be the number of places in which the sequence (x1 , . . . , xn ) differs from (y1 , . . . , yn ) (k is a function of the two sequences). Then we can write p(x1 , . . . , xn , y1 , . . . , yn ) = n Y p(xi , yi ) i=1 = (0.45)n−k (0.05)k n 1 = (1 − p)n−k pk 2 An alternative way at looking at this probability is to look at the binary symmetric channel as in additive channel Y = X ⊕ Z, where Z is a binary random variable that is equal to 1 with probability p, and is independent of X. In this case, p(x1 , . . . , xn , y1 , . . . , yn ) = p(x1 , . . . , xn )p(y1 , . . . , yn | x1 , . . . , xn ) = p(x1 , . . . , xn )p(z1 , . . . , zn | x1 , . . . , xn ) = p(x1 , . . . , xn )p(z1 , . . . , zn ) n 1 (1 − p)n−k pk = 2 Show that the condition that (x1 , . . . , xn , y1 , . . . , yn ) being jointly typical is equivalent to the condition that (x1 , . . . , xn ) is typical and (z1 , . . . , zn ) = (y1 − x1 , . . . , yn − xn ) is typical. Solution: Out of the three conditions for joint typicality, the only condition that matters is the last one. As argued above, n 1 1 1 n−k k (1 − p) p − log p(x1 , . . . , xn ) = − log n n 2 k n−k = 1 − log p − log(1 − p) n n Thus the pair (x1 , . . . , xn , y1 , . . . , yn ) is jointly typical iff |1 − nk log p − n−k n log(1 − p) − H(X, Y )| < ε, i.e., iff | − nk log p − n−k log(1 − p) − H(p)| < ε, which is exactly the n condition for (z1 , . . . , zn ) = (y1 − x1 , . . . , yn − xn ) to be typical. Thus the set of jointly typical pairs (x1 , . . . , xn , y1 , . . . , yn ) is the set such that the number of places in which (x1 , . . . , xn ) differs from (y1 , . . . , yn ) is close to np. d) Calculate the size of An,ε (Z) for n = 25 and ε = 0.2. (Hint: you may refer to the following table of the probabilities and numbers of sequences of with less than k ones) n j Cumul. Pr. − n1 log p(x1 , . . . , xn ) 1 0.071790 0.152003 1 26 0.271206 0.278800 2 326 0.537094 0.405597 3 2626 0.763591 0.532394 4 15276 0.902006 0.659191 5 68406 0.966600 0.785988 6 245506 0.990523 0.912785 7 726206 0.997738 1.039582 8 1807781 0.999542 1.166379 9 .. . 3850756 .. . 0.999920 .. . 1.293176 .. . k P 0 j≤k Solution: The noise sequence is drawn i.i.d. according to the distribution (1p, p), thus we have H(Z) = H(0.1) = 0.469. Setting ε = 0.2, the typical set for Z is the set of sequences for which − n1 log p(z1 , . . . , zn ) ∈ [H(Z) − ε, H(Z) + ε] = (0.269, 0.669). Looking at the table above for n = 25,it follows that the typical Z sequences are those with 1,2,3 or 4 ones. (n). The total probability of the set An,ε (Z) is, P (An,ε (Z)) = .902006−0.071790 = 0.830216. The size of this set is 15276 − 1 = 15275 e) Now consider random coding for the channel, as in the proof of the channel coding theorem. Assume that 2nR codewords codeword 1 x1,1 ... x1,n codeword 2 .. . x2,1 ... x2,n codeword 2nR x2nR ,1 ... ... x2nR ,n are chosen uniformly over the 2n possible binary sequences of length n. One of these codewords is chosen and sent over the channel. The receiver looks at the received sequence and tries to find a codeword in the code that is jointly typical with the received sequence. As argued above, this corresponds to finding a codeword (xi,1 , . . . , xi,n ) such that (y1 −xi,1 , . . . , yn −xi,n ) ∈ An,ε (Z). For a fixed codeword (xi,1 , . . . , xi,n ), what is the probability that the received sequence (y1 , . . . , yn ) is such that (xi,1 , . . . , xi,n , y1 , . . . , yn ) is jointly typical? Compute such probability for n = 25 and ε = 0.2. Solution: The easiest way to calculate this probability is to view the BSC as an additive channel Y = X ⊕ Z , where Z is Bernoulli(p). Then the probability that for a given codeword, (xi,1 , . . . , xi,n ) , that the output (y1 , . . . , yn ) is jointly typical with it is equal to the probability that the noise sequence (z1 , . . . , zn ) is typical, i.e., in An,ε (Z). The noise sequence is drawn i.i.d. according to the distribution (1 − p, p), and as calculated above, the probability that the sequence is typical P (An,ε (Z)) = 0.830216. Therefore the probability that the received sequence is not jointly typical with the transmitted codeword is 0.169784. f ) Suppose now that a fixed codeword (x1,1 , . . . , x1,n ) was sent and (y1 , . . . , yn ) was received. Denote the event Ej = {(xj,1 , . . . , xj,n , y1 , . . . , yn ) ∈ An,ε (X, Y )}, j ∈ {2, 3, . . . , 2nR } to be the event that there is a codeword other than (x1,1 , . . . , x1,n ) jointly typical with (y1 , . . . , yn ). Let n = 25, R = 0.36, ε = 0.2. Use the union bound to find an upper bound to Pr(E2 ∪ E3 ∪ · · · ∪ E2nR | (x1,1 , . . . , x1,n ) was sent). Solution: The probability that a randomly chosen (xi,1 , . . . , xi,n ) is typical with the given 1 n (y1 , . . . , yn ) is |An,ε (Z)| ∗ 2 = 15275 ∗ 2−25 = 4.552 × 10−4 . Using the simple union of events bound gives the probability of being codeword 1 n nR jointly typical with the received sequence to be |An,ε (Z)| ∗ 2 ∗ (2 − 1) = 4.552 × 10−4 × 511 = 0.23262. Note that instead of the union bound, one can also compute the probability more accurately as follows: We know that each of the other codewords is jointly typical with received sequence with probability 4.552 × 10−4 , and each of these codewords is independent. The probability that none of the 511 codewords are jointly typical with the received sequence is therefore (1 − 4.552 × 10−4 )511 = 0.79241, and the probability that at least one of them is jointly typical with the received sequence is therefore 1 − 0.79241 = 0.20749. g) Given that a particular codeword was sent, the probability of error (averaged over the probability distribution of the channel and over the random choice of other codewords) can be written as X Pr(Error | codeword 1 sent) = p(y1 , . . . , yn | x1,1 , . . . , x1,n ) {y1 ,...,yn } causes error There are two kinds of error: the first occurs if the received sequence {y1 , . . . , yn } is not jointly typical with the transmitted codeword, and the second occurs if there is another codeword jointly typical with the received sequence. Using the result of the previous parts, calculate this probability of error. By the symmetry of the random coding argument, this does not depend on which codeword was sent. Solution: There are two error events, which are conditionally independent, given the received sequence. In the previous part, we showed that the conditional probability of error of the second kind was 0.20749, irrespective of the received sequence (y1 , . . . , yn ). The probability of error of the first kind is 0.1698, conditioned on the input codeword. In part (e), we calculated the probability that (y1 − xi,1 , . . . , yn − xi,n ) ∈ / An,ε (Z), but this was conditioned on a particular input sequence. Now by the symmetry and uniformity of the random code construction, this probability does not depend on (xi,1 , . . . , xi,n ),and therefore the probability that (y1 − xi,1 , . . . , yn − xi,n ) ∈ / An,ε (Z) is also equalto this probability, i.e., to 0.1698. We can therefore use a simple union of events bound to bound the total probability of error ≤ 0.1698 + 0.2075 = 0.3773. Thus we can send 512 codewords of length 25 over a BSC with crossover probability 0.1 with probability of error less than 0.3773. A little more accurate calculation can be made of the probability of error using the fact that conditioned on the received sequence, both kinds of error are independent. Using the symmetry of the code construction process, the probability of error of the first kind conditioned on the received sequence does not depend on the received sequence, and is therefore 0.1698. Therefore the probability that neither type of error occurs is (using their independence) = (1 − 0.1698)(1 − 0.2075) = 0.6579 and therefore, the probability of error is 1 − 0.6579 = 0.3421 7.2 Symmetric Channels Consider a discrete memoryless channel with input X ∈ {1, 2, . . . , m} and output Y ∈ {1, 2, . . . , n}. Let the channel transition probabilities be given by a matrix P where the entry in the xth row and y td column denotes the conditional probability Pr(Y = y | X = x). A channel is said to be symmetric if the rows of the matrix P are permutations of each other and the columns are permutations of each other. A channel is said to be weakly symmetric if P the rows of the matrix P are permutations of each other and all the column sums x p(y | x) are equal. Show that, for a weakly symmetric channel, the capacity C is C = log n − H(row of transition matrix) and this is achieved by a uniform distribution on the input alphabet. [Hint: Let r be a row of the transition matrix. Show that I(X; Y ) ≤ log n − H(r). Use the condition for equality.] Solution: Let r be a row of the transition matrix. Then I(X; Y ) = H(Y ) − H(Y |X) = H(Y ) − H(r) ≤ log n − H(r), where the last step follows from the fact that the entropy of a discrete random variable is always less than the logarithm of the alphabet size of the random variable. The condition for equality is when the distribution of Y is uniform on its alphabet. If there exists a input distribution that makes the distribution of Y uniform on its alphabet, it will be the capacityachieving distribution and the RHS in the above inequality will be the capacity. Let us check if that is true for weakly symmetric channels with uniform distribution on the input alphabet: m X m 1 X c Pr(Y = y) = Pr(Y = y | X = x) Pr(X = x) = Pr(Y = y | X = x) = , | {z } m m i=1 i=1 =1/m {z } | :=c where the second equality arises because the input is uniformly distributed, and the third equality is due to the property of weakly symmetric channels (sum of rows of the transition matrix are the same).
© Copyright 2024