Primal&dual:linear regression as an example

TL;DR

primal

problem: $\mathbf{y}=\mathbf{x}^T\mathbf{w}$

loss: $l=||\mathbf{X}^T\mathbf{w}-\mathbf{y}||^2+1/2*||\mathbf{w}||^2$

solution(learn $w$ ): $\mathbf{w}=(\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y}$

dual

fact: $w$ lies in the space spanned by training data.

problem transformed to: $\mathbf{y}=\mathbf{x}^T(\mathbf{X}^T\mathbf\alpha)=\sum\limits_{i=1}^M\alpha_i\mathbf{x}^T\mathbf{X_i}$ ( $\mathbf{X_i}$ is the ith row)

solution(learn $\alpha$ ): $\mathbf\alpha=(\mathbf{X}\mathbf{X}^T+\lambda\mathbf{I})^{-1}\mathbf{y}$

Matrix calculus

To get solutions of loss funtions, we need matrix derivative calculations. It is not a new math opertion but partial derivations to vector/matrix element-wise. The matrix notation help us to have a compact represention instead of writing down derivation for each element explicitly.

People already work out quick lookup tables for fundamental identities. To calculate $\partial l / \partial \mathbf{w}$ , here we need is $\frac {\partial (\mathbf{A}\mathbf{x}+\mathbf{b})\mathbf{C}(\mathbf{D}\mathbf{x}+\mathbf{E})} {\partial \mathbf{x}}$ in scalar by matrix identities section.

Calculation of $\alpha$

$\mathbf{\alpha} = (\mathbf{X}^T)^{-1} \mathbf w = (\mathbf{X}^T)^{-1} (\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^T \mathbf{y} = {[(\mathbf{X}^T)^{-1} (\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I}) \mathbf{X}^T]}^{-1} \mathbf{y}$

Primal and dual problem-linear regression as an example

Primal&dual:linear regression as an example

TL;DR

Matrix calculus

Calculation of $\alpha$

results matching ""

No results matching ""

Primal&dual:linear regression as an example

TL;DR

Matrix calculus

Calculation of α\alphaα

results matching ""

No results matching ""

Calculation of $\alpha$