5¶
习题1¶
构建模型使得预测值与真实值的误差最小常用向量2-范数度量,求解模型过程中需要计算梯度,求梯度:
\(f(A)=\frac{1}{2}\lVert Ax+b-y\rVert_2^2\),求 \(\frac{\partial f}{\partial A}\)
\(f(x)=\frac{1}{2}\lVert Ax+b-y\rVert_2^2\),求 \(\frac{\partial f}{\partial x}\)
其中 \(A\in R^{m\times n}\),\(x\in R^n\),\(b,y\in R^m\)
解¶
\(df=Tr(df)=Tr\{d[\frac{1}{2}(Ax+b-y)^T(Ax+b-y)]\}\)
\(=\frac{1}{2}Tr\{d[(Ax)^T(Ax)+(b-y)^T(Ax)+(Ax)^T(b-y)+(b-y)^T(b-y)]\}\)
\(=\frac{1}{2}Tr\{d[(Ax)^T(Ax)]+d[(b-y)^T(Ax)]+d[(Ax)^T(b-y)]+d[(b-y)^T(b-y)]\}\)
\(=\frac{1}{2}Tr\{2(Ax)^Td(Ax)+(b-y)^Td(Ax)+d(b-y)^T(Ax)+(Ax)^Td(b-y)+d(Ax)^T(b-y)+d[(b-y)^T(b-y)]\}\)
\(=\frac{1}{2}Tr\{2(Ax)^Td(Ax)+(b-y)^Td(Ax)+d(Ax)^T\cdot(b-y)+(\text{与}dA\text{或}dx\text{无关的项})\}\)
\(=\frac{1}{2}Tr\{2(Ax)^Td(Ax)+(b-y)^Td(Ax)+d(Ax)^T\cdot[(b-y)^T]^T+(\text{与}dA\text{或}dx\text{无关的项})\}\)
\(=\frac{1}{2}Tr\{2(Ax)^Td(Ax)+(b-y)^Td(Ax)+[(b-y)^Td(Ax)]^T+(\text{与}dA\text{或}dx\text{无关的项})\}\)
\(=Tr\{(Ax)^Td(Ax)+(b-y)^Td(Ax)\}+(\text{与}dA\text{或}dx\text{无关的项})\)
\(=Tr\{(Ax+b-y)^Td(Ax)\}+(\text{与}dA\text{或}dx\text{无关的项})\)
\(=Tr\{(Ax+b-y)^T(Adx+dA\cdot x)\}+(\text{与}dA\text{或}dx\text{无关的项})\)
\(=Tr\{(Ax+b-y)^TAdx+x(Ax+b-y)^TdA\}+(\text{与}dA\text{或}dx\text{无关的项})\)
又因为 \(df=Tr\left[\left(\frac{\partial f}{\partial A}\right)^T dA\right]\)
所以
\(\frac{\partial}{\partial A}f=(Ax+b-y)x^T\)
\(\frac{\partial}{\partial x}f=A^T(Ax+b-y)\)
习题2¶
二次型是数据分析中常用函数,求 \(\frac{\partial x^TAx}{\partial x}\),\(\frac{\partial x^TAx}{\partial A}\),其中 \(A\in R^{m\times m}\),\(x\in R^m\)
解¶
\(\frac{\partial x^TAx}{\partial x}=(A+A^T)x\)(lec15 例9)
\(\frac{\partial x^TAx}{\partial A_{ij}}=x_ix_j\),\(\frac{\partial x^TAx}{\partial A}=xx^T\)
习题3¶
求 \(\frac{\partial |X^k|}{\partial X}\),其中 \(X\in R^{m\times m}\) 为可逆矩阵。
解¶
\(\frac{\partial |X^k|}{\partial X}=\frac{\partial |X^k|}{\partial |X|}\frac{\partial |X|}{\partial X}=k|X|^{k-1}(X^*)^T=k|X|^{k-1}|X|X^{-T}=k|X|^kX^{-T}\)(行列式求导运用到lec15 定理4)
习题4¶
求 \(\frac{\partial Tr(AXBX^TC)}{\partial X}\),其中 \(A\in R^{m\times n}\),\(X\in R^{n\times k}\),\(B\in R^{k\times k}\),\(C\in R^{n\times m}\)
解¶
\(d(Tr(AXBX^TC))=Tr[d(AXBX^TC)]\)
\(=Tr[Ad(XBX^TC)+dA\cdot(XBX^TC)]\)
\(=Tr[AXd(BX^TC)+AdX\cdot(BX^TC)+\text{其它与}dX\text{无关的项}]\)
\(=Tr[AXBd(X^TC)+AXdB\cdot(X^TC)+BX^TCAdX+\text{其它与}dX\text{无关的项}]\)
\(=Tr[AXBX^TdC+AXBdX^T\cdot C+BX^TCAdX+\text{其它与}dX\text{无关的项}]\)
\(=Tr[(dX)^T[(CAXB)^T]+BX^TCAdX+\text{其它与}dX\text{无关的项}]\)
\(=Tr\{[(CAXB)^TdX]^T+BX^TCAdX\}+\text{其它与}dX\text{无关的项}\)
\(=Tr\{[(CAXB)^T+BX^TCA]dX\}+\text{其它与}dX\text{无关的项}\)
\(\frac{\partial Tr(AXBX^TC)}{\partial X}=(BX^TCA)^T+CAXB\)
习题5¶
利用迹微分法求解 \(\frac{\partial Tr(W^{-1})}{\partial W}\),其中 \(W\in R^{m\times m}\)
解¶
因为
\(0=dI=d(WW^{-1})=dWW^{-1}+WdW^{-1}\)
\(WdW^{-1}=-dWW^{-1}\)
\(dW^{-1}=-W^{-1}dWW^{-1}\)
所以
\(dTr(W^{-1})=Tr(dW^{-1})\)
\(=Tr(-W^{-1}dWW^{-1})\)
\(=Tr(-(W^{-1})^2dW)\)
即
\(\frac{\partial Tr(W^{-1})}{\partial W}=-(W^{-T})^2\)
习题6¶
\(exp(z)=(exp(z_1),exp(z_2),...,exp(z_n))^T\),其中 \((exp(z))_i=exp(z_i)\),\(log(z)=(log(z_1),log(z_2),...,log(z_n))^T\),其中 \((log(z))_i=log(z_i)\),\(f(z)=\frac{exp(z)}{1^Texp(z)}\) 称为 \(softmax\) 函数,如果 \(q=f(z)\),\(J=-p^Tlog(q)\),其中 \(p,q,z\in R^n\),并且 \(1^Tp=1\)
(1) 证:\(\frac{\partial J}{\partial z}=q-p\)
(2) 若 \(z=Wx\),其中 \(W\in R^{n\times m}\),\(x\in R^m\),\(\frac{\partial J}{\partial W}=(q-p)x^T\) 是否成立。
解¶
(1) 证明
\(J=-p^Tlog\left(\frac{exp(z)}{1^Texp(z)}\right)\)
\(=-p^Tz+p^Tlog(1^Texp(z))\mathbf{1}\)
\(=-p^Tz+p^T\mathbf{1}log(1^Texp(z))\)
\(=-p^Tz+log(1^Texp(z))\)
\(\frac{\partial J}{\partial z}=-p+\frac{\partial log(1^Texp(z))}{\partial z}\)
\(=-p+\frac{\partial 1^Texp(z)}{\partial z}\frac{1}{1^Texp(z)}\)
\(=-p+\frac{exp(z)}{1^Texp(z)}\)
\(=-p+q\)
(2) 成立,证明如下:
\(dJ=dTr(J)=Tr(dJ)=Tr[(-p+q)^TdWx]=Tr[x(-p+q)^TdW]\)
\(\frac{\partial J}{\partial W}=(-p+q)x^T\)
习题7¶
(1)已知
\(f(t)=sin(log(t^Tt)),t\in R^m\)
求 \(f\) 对 \(t\) 的梯度。
(2)已知
\(g(X)=Tr(AXB),A\in R^{m\times n},X\in R^{n\times p},B\in R^{p\times m}\)
求 \(g\) 对 \(X\) 的梯度。
解¶
(1) 定义:
\(g(t)=t^Tt=\sum_{i=1}^n t_i^2\)
\(h(t)=log(g(t))=log(t^Tt)\)
\(f(t)=sin(h(t))\)
应用链式法则:
- 计算 \(\nabla_tg(t)\):
\(\nabla_tg=2t\)
- 计算 \(\nabla_th(t)\):
\(\nabla_th=\frac{1}{g}\nabla g=\frac{2t}{t^Tt}\)
- 计算 \(\nabla_tf(t)\):
\(\nabla_tf=cos(h)\cdot\nabla h=cos(log(t^Tt))\cdot\frac{2t}{t^Tt}\)
(2)给定:
\(g(X)=Tr(AXB)\)
其中,\(A\in R^{m\times n}\),\(X\in R^{n\times p}\),\(B\in R^{p\times m}\)。
利用矩阵微积分中的性质:
\(Tr(AXB)=Tr(BAX)\)
对矩阵函数 \(g(X)=Tr(CX)\),其中 \(C=BA\),有:
\(\nabla_XTr(CX)=C^T\)
因此:
\(\nabla_Xg(X)=(BA)^T=A^TB^T\)