Linear regression probabilistic perspective

TL;DR

For a linear model, minimizing a sum-of-square error fucntion is equivalent to maximinzing the likelihood fucntion under a conditional Guassian noise distrubution.

minimizing a sum-of-square error fucntion with the addition of a quadratic regularization term is equivalent to maximizing posterior distribution(bayesian version of linear model).

Likelihood, prior and posterior

Recall that a linear regression model is actually a transformation from input x\mathbf x to output y\mathbf y, governed by parameter w\mathbf w. The bayesian theory thinks the parameter w\mathbf w is not a constant but some distribution p(w)p(\mathbf w). During training phase, output y\mathbf y is observed.

  • p(w)p(\mathbf w) -> prior distribution
  • p(yw,x)p(\mathbf y|\mathbf w, \mathbf x) -> likelihood function
  • the conditional ditribution p(wy,x)p(\mathbf w|\mathbf y, \mathbf x) represents the corresponding posterior distribution over w\mathbf w.

We are not seeking to model the distribution of x\mathbf x. Thus it will always appear in the set of conditioning variables and we could even drop it to keep the notation compact.

Readings

results matching ""

    No results matching ""