HW3 (Due 10/3)

  1. (Gaussian process regression) In the previous homework, we saw how we might fit a curve with just several observed points on the curve. However, we assumed that all the observations were noiseless. We will continue our examples but assume the observations are now subjected to some Gaussian noises. Please first download the data from here. As in previous homework problems, there are two variables in the mat file, pos and observed, i.e., the observed locations and the corresponding observations. We assumed that there is no error/noise for pos, but observed = true + noise, where noise \(\sim \mathcal{N}(0,\sigma^2)\) with \(\sigma^2=0.1\). Like the previous homework problem, we will assume that the entire curve is a joint Gaussian random variable of length 200. We will consider the covariance to be fixed but the mean can vary. Recall from the lecture that under this assumption, normal distribution is conjugate prior of its own. So to update the estimated mean, we just need to multiply the Gaussian likelihood by the Gaussian prior. For the Gaussian prior, we will assume zero-mean and a squared exponential kernel for the covariance matrix. That is, the covariance between \(i\)th and \(j\)th variables is equal to \(\exp(-\lambda (i-j)^2)\). Let's assume \(\lambda=0.001\).

    1. Let \(\Lambda\) be the prior precision matrix (so prior covariance \(\Sigma=\Lambda^{-1}\)). Let \(\Lambda_o\) be the precision matrix of the observation and \(\mu\) be the the mean (variable), thus the likelihood of observing \({\bf x}\) is \(\mathcal{N}({\bf x};\mu, \Lambda_o^{-1})\). Please show that \(p(\mu|{\bf x}; \Lambda, \Lambda_o)=\mathcal{N}(\mu; (\Lambda+ \Lambda_o)^{-1}\Lambda_o {\bf x}, (\Lambda+\Lambda_o)^{-1})\).

    2. Please describe what \(\Lambda_o\) is like if there are only two observations at the first two positions.

    3. Since \(\Lambda_o\) has very low rank, computing \((\Sigma^{-1} + \Lambda_o)^{-1}\) directly can be very inaccurate in Matlab. You will need to take advantage of the Woodbury matrix identity. Write \(\Lambda_o = UCV\) with \(C\) being full-rank and then compute the new covariance as \((\Sigma^{-1}+\Lambda_o)^{-1}=(\Sigma^{-1}+UCV)^{-1}=\Sigma - \Sigma U(C^{-1} + V\Sigma U)^{-1}V \Sigma\) instead. For \(\Lambda_o\) described in the previous part, please decompose \(\Lambda_o=UCV\).

    4. Predict the means and variances given the observations. Plot the estimate (the means) with error bar (mean +/- standard derivations).

    5. Describe how you may tackle the problem if pos (observed locations) are noisy as well.

  2. (Department store) Consider a department store and that the number of arriving customers in an hour is modeled by a Poisson likelihood and a Gamma prior (with a=2 and b=1). Let say we observed that in the first hour, there are 20 customers arrived. And we observed in the next hour, there are 22 customers arrived.

    1. What is the Bayesian estimate of the number of customer arrived in the third hour?

    2. What is the MAP estimate of the number of customer arrived in the third hour?

    3. What is the Bayesian estimate of the number of customer arrives in the third and fourth hours (two hour period)?

    4. What is the MAP estimate of the number of customer arrives in the third and fourth hours (two hour period)?