HW3 (Due 10/27)

(4 points) Consider two distributions \(p=[0.199, 0.599, 0.001, 0.199, 0.001, 0.001]\) and \(q=[0.1, 0.3, 0.1, 0.1, 0.3, 0.1]\). Compute \(KL(p||q)\) and \(KL(q||p)\). Note that they are significantly different from one another. You may also want to check out this tweet to get some more insight.
(6 points) For a group of students with weight \(W\) and height \(H\) are jointly normally distributed with mean of 50 kg and 160 cm and a covariance matrices of \(E\left[\begin{matrix} (W-\bar{W})^2 & (W-\bar{W})(H-\bar{H}) \\ (H-\bar{H})(W-\bar{W}) & (H-\bar{H})^2 \end{matrix} \right]=\begin{pmatrix} 80 & 30 \\ 30 & 140 \end{pmatrix}\).
1. Compute \(h(W)\)
2. Compute \(h(W|H)\)
3. how many bits on average will be needed per sample to store a student's weight with precision of 0.1 if his height is known?
(10 points) Consider Problem 1 in last HW again, compute
1. \(H(D|X_1)\)
2. \(H(D|X_1,X_2)\)
3. \(I(X_1;D)\)
4. \(I(X_1;D|X_2)\)
5. \(I(X_1;X_2)\)
Extra credit. We will study a bit more the Shannon-Fano-Elias code briefly described in class. Recall that any SFE codeword can be viewed as an interval between 0 and 1. For example, \(1011\) can be viewed as \([0.1011b,0.101\dot1b]=[0.1011b,0.11b]=[0.6875,0.75]\).
1. (5 points) Show that two codewords cannot be prefix of one another if their corresponding interval does not overlapped.
2. (5 points) Consider a usual setup that we split the \([0,1]-\)interval according to the probabilities of the alphabet. And then we generate the codeword of a symbol out of the mid point of the corresponding interval of symbol. For example, if we only have two symbols \(a\) and \(b\) with probability 0.4 and 0.6. \(a\) should take \(0.2=0.00110011\cdots b\), where, say, we can pick a codeword \(00\), which correspond to interval \([0.00b, 0.00\dot 1b]=[0.00b, 0.01b]\). Or we can also pick \(001\), which corresponds to interval \([0.001b, 0.001\dot1 b]=[0.001b,0.01b]\). Note that the interview shrink by half whenever the length of the codeword increase by one. Question: what is the minimum length of the codeword needed for a symbol \(a\) with probability \(p(a)\)? Try to represent this minimum length in terms of \(p(a)\).
3. (5 points) From the previous part, find an upper bound for the expected length of a codeword in terms of the entropy of the source. And show rigorously with a proof that your upper bound holds for any situation.