Linear regression probabilistic perspective
TL;DR
For a linear model, minimizing a sum-of-square error fucntion is equivalent to maximinzing the likelihood fucntion under a conditional Guassian noise distrubution.
minimizing a sum-of-square error fucntion with the addition of a quadratic regularization term is equivalent to maximizing posterior distribution(bayesian version of linear model).
Likelihood, prior and posterior
Recall that a linear regression model is actually a transformation from input to output , governed by parameter . The bayesian theory thinks the parameter is not a constant but some distribution . During training phase, output is observed.
- -> prior distribution
- -> likelihood function
- the conditional ditribution represents the corresponding posterior distribution over .
We are not seeking to model the distribution of . Thus it will always appear in the set of conditioning variables and we could even drop it to keep the notation compact.
Readings
- prml 2.3.3 bayes's theorem for Guassian variables
- prml 3.1.1 maximum likelihood and least squares
- prml 3.1.4 regularized least squares
- prml 3.3.1 parameter distribution for bayesian linear regression
- prml 1.5.4 Inference and decision
- Generative vs. Discriminative; Bayesian vs. Frequentist
- All Bayesian Models are Generative (in Theory)