I want to prove that the estimator of ridge regression is the mean of the posterior distribution under Gaussian prior.
$$y sim N(Xbeta,sigma^2I),quad text{prior }beta sim N(0,gamma^2 I).$$
$$hat{beta} = left(X^TX + frac{sigma^2}{gamma^2}Iright)^{-1}X^Ty.$$
What I’m trying to show is want to show that $mu$ = $hat{B}$, for $mu$ in $$-frac{1}{2}(beta – mu)^TSigma^{-1}(beta – mu)$$ $Sigma^{-1}$ is the covariance matrix for the posterior distribution $p(betamid X,y)$.
There is a solution to this question the last couple of lines on page 3 from http://ssli.ee.washington.edu/courses/ee511/HW/hw3_solns.pdf. I’m baffled as to how it does this. (The problem is exercise 3.6.)