Primal&dual:linear regression as an example
TL;DR
primal
problem: y=xTw
loss: l=∣∣XTw−y∣∣2+1/2∗∣∣w∣∣2
solution(learn w): w=(XTX+λI)−1XTy
dual
fact: w lies in the space spanned by training data.
problem transformed to: y=xT(XTα)=i=1∑MαixTXi (Xi is the ith row)
solution(learn α): α=(XXT+λI)−1y
Matrix calculus
To get solutions of loss funtions, we need matrix derivative calculations. It is not a new math opertion but partial derivations to vector/matrix element-wise. The matrix notation help us to have a compact represention instead of writing down derivation for each element explicitly.
People already work out quick lookup tables for fundamental identities. To calculate ∂l/∂w, here we need is ∂x∂(Ax+b)C(Dx+E) in scalar by matrix identities section.
Calculation of α
α=(XT)−1w=(XT)−1(XTX+λI)−1XTy=[(XT)−1(XTX+λI)XT]−1y