Primal&dual:linear regression as an example

TL;DR

primal

problem: y=xTw\mathbf{y}=\mathbf{x}^T\mathbf{w}

loss: l=XTwy2+1/2w2l=||\mathbf{X}^T\mathbf{w}-\mathbf{y}||^2+1/2*||\mathbf{w}||^2

solution(learn ww): w=(XTX+λI)1XTy\mathbf{w}=(\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y}

dual

fact: ww lies in the space spanned by training data.

problem transformed to: y=xT(XTα)=i=1MαixTXi\mathbf{y}=\mathbf{x}^T(\mathbf{X}^T\mathbf\alpha)=\sum\limits_{i=1}^M\alpha_i\mathbf{x}^T\mathbf{X_i} (Xi\mathbf{X_i} is the ith row)

solution(learn α\alpha): α=(XXT+λI)1y\mathbf\alpha=(\mathbf{X}\mathbf{X}^T+\lambda\mathbf{I})^{-1}\mathbf{y}

Matrix calculus

To get solutions of loss funtions, we need matrix derivative calculations. It is not a new math opertion but partial derivations to vector/matrix element-wise. The matrix notation help us to have a compact represention instead of writing down derivation for each element explicitly.

People already work out quick lookup tables for fundamental identities. To calculate l/w\partial l / \partial \mathbf{w}, here we need is (Ax+b)C(Dx+E)x\frac {\partial (\mathbf{A}\mathbf{x}+\mathbf{b})\mathbf{C}(\mathbf{D}\mathbf{x}+\mathbf{E})} {\partial \mathbf{x}} in scalar by matrix identities section.

Calculation of α\alpha

α=(XT)1w=(XT)1(XTX+λI)1XTy=[(XT)1(XTX+λI)XT]1y\mathbf{\alpha} = (\mathbf{X}^T)^{-1} \mathbf w = (\mathbf{X}^T)^{-1} (\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^T \mathbf{y} = {[(\mathbf{X}^T)^{-1} (\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I}) \mathbf{X}^T]}^{-1} \mathbf{y}

results matching ""

    No results matching ""