跳转至

第 5 次作业

理论部分

习题 1

利用等式

\[ \left\| \boldsymbol {A} (\boldsymbol {x} + \alpha \boldsymbol {w}) - \boldsymbol {b} \right\| _ {2} ^ {2} = \left\| \boldsymbol {A x} - \boldsymbol {b} \right\| _ {2} ^ {2} + 2 \alpha \boldsymbol {w} ^ {T} \boldsymbol {A} ^ {T} (\boldsymbol {A x} - \boldsymbol {b}) + \alpha^ {2} \| \boldsymbol {A w} \| _ {2} ^ {2} \]

证明:如果 \(\pmb { x } \in { \pmb X } _ { L S }\) ,那么 \(\pmb { A } ^ { T } \pmb { A } \pmb { x } = \pmb { A } ^ { T } \pmb { b }\)

解. 设 \(f ( \alpha ) = \| A ( \pmb { x } + \alpha \pmb { w } ) - \pmb { b } \| _ { 2 } ^ { 2 }\) ,由于 \(\pmb { x } \in { \pmb { X } } _ { L S }\) ,说明当 \(\alpha = 0\) 时,函数取极小点。由于 \(f\) 是关于 \(\alpha\) 的二次函数,故在 \(\begin{array} { r } { \alpha = - \frac { 2 w ^ { T } A ^ { T } ( A x - b ) } { 2 \alpha ^ { 2 }\| \left. A w \|\right. _ { 2 } ^ { 2 } } } \end{array}\) 取得极值点。代入 \(\alpha = 0\) ,有

\[ \boldsymbol {w} ^ {T} \boldsymbol {A} ^ {T} (\boldsymbol {A} \boldsymbol {x} - \boldsymbol {b}) = 0 \]

又由于 \(\pmb { w }\) 的任意性,有

\[ \boldsymbol {A} ^ {T} \boldsymbol {A} \boldsymbol {x} = \boldsymbol {A} ^ {T} \boldsymbol {b} \]

习题 2

\[ A := \left( \begin{array}{c c c} 5 & - 1 & 1 \\ - 1 & 2 & 0 \\ 1 & 0 & 3 \end{array} \right) \]

\(\Lambda ( A ) = \{ \lambda _ { 1 } , \lambda _ { 2 } , \lambda _ { 3 } \} \subseteq \mathbb { C }\) with \(| \lambda _ { 1 } | \geq | \lambda _ { 2 } | \geq | \lambda _ { 3 } | .\)

(i) 使用 Gerschgorin 圆盘定理, 证明 \({ \frac { \left| \lambda _ { 1 } \right| } { \left| \lambda _ { 3 } \right| } } \leq 7 .\)(注:由于 \(A\) 为对称矩阵, \(\frac { \left| \lambda _ { 1 } \right| } { \left| \lambda _ { 3 } \right| }\)\(A\) 的条件数)

(ii)(编程题,提交代码)使用幂法与反幂法计算 \(\frac { \left| \lambda _ { 1 } \right| } { \left| \lambda _ { 3 } \right| }\)

解. \(( i )\)\(a \in \mathbb { C }\) , \(r \in \mathbb { R }\) , \(r > 0\) , 记 \(D ( a , r ) : = \{ z \in \mathbb { C } : | z - a | \leq r \} \subseteq \mathbb { C } .\) . 对 \(A\)\(A ^ { \mathrm { T } }\) 使用 Gerschgorin 圆盘定理,有 \(\Lambda ( A ) = \Lambda \left( A ^ { \mathrm { T } } \right) \subseteq { \tilde { G } } _ { 1 } \cup { \tilde { G } } _ { 2 } \cup { \tilde { G } } _ { 3 }\) ,其中

\[ \tilde {G} _ {1} := D (5, 2), \tilde {G} _ {2} := D (2, 1), \tilde {G} _ {3} := D (3, 1) \]

可得 \(| \lambda _ { 1 } | \le 7\) , \(| \lambda _ { 3 } | \geq 1 .\) . 故 \(\frac { \left| \lambda _ { 1 } \right| } { \left| \lambda _ { 3 } \right| } \le 7\) .

\((i i) \frac {| \lambda_ {1} |}{| \lambda_ {3} |} \approx 3. 4 8 2 3\)

习题 3

构建模型使得预测值与真实值的误差最小常用向量2-范数度量,求解模型过程中需要计算梯度,求梯度:

\(\begin{array} { r } { f ( A ) = \frac { 1 } { 2 } \| A x + b - y \| _ { 2 } ^ { 2 } } \end{array}\) ,求 \(\frac { \partial f } { \partial A }\)

\(\begin{array} { r } { f ( x ) = \frac { 1 } { 2 } \| A x + b - y \| _ { 2 } ^ { 2 } } \end{array}\) ,求 \(\textstyle { \frac { \partial f } { \partial x } }\)

其中 \(A \in R ^ { m \times n }\)\(x \in R ^ { n }\)\(b,y \in R ^ { m }\)

解.

\[ \begin{array}{l} \frac {\partial}{\partial A} f = \frac {\partial}{\partial A} \frac {1}{2} \left(x ^ {T} A ^ {T} A x + 2 (b - y) ^ {T} A x + (b - y) ^ {T} (b - y)\right) \\ = \frac {\partial}{\partial A} \frac {1}{2} \left(x ^ {T} A ^ {T} A x + 2 (b - y) ^ {T} A x\right) \\ = A x x ^ {T} + (b - y) x ^ {T} \\ \frac {\partial}{\partial x} f = A ^ {T} A x + A ^ {T} (b - y) \\ \end{array} \]

习题 4

二次型是数据分析中常用函数,求 \(\frac { \partial x ^ { T } A x } { \partial x }\)\(\frac { \partial x ^ { T } A x } { \partial A }\) ,其中 \(A \in R ^ { m \times m }\)\(x \in R ^ { m }\)

解. $$ \frac { \partial x ^ { T } A x } { \partial x } = ( A + A ^ { T } ) x $$

\[ \frac {\partial x ^ {T} A x}{\partial A} _ {i j} = x _ {i} x _ {j}, \frac {\partial x ^ {T} A x}{\partial A} = x x ^ {T} \]

习题 5

利用迹微分法求解 \(\frac { \partial T r ( W ^ { - 1 } ) } { \partial W }\) ,其中 \(W \in R ^ { m \times m }\)

解. 因为

\[ 0 = d I = d \left(W W ^ {- 1}\right) = d W W ^ {- 1} + W d W ^ {- 1} \]
\[ W d W ^ {- 1} = - d W W ^ {- 1} \]
\[ d W ^ {- 1} = - W ^ {- 1} d W W ^ {- 1} \]

所以

\[ \begin{array}{l} d T r (W ^ {- 1}) = T r (d W ^ {- 1}) \\ = T r \left(- W ^ {- 1} d W W ^ {- 1}\right) \\ = T r \left(- \left(W ^ {- 1}\right) ^ {2} d W\right) \\ \end{array} \]

\[ \frac {\partial T r (W ^ {- 1})}{\partial W} = - (W ^ {- T}) ^ {2} \]

习题 6

\(( \exp ( \pmb { z }) ) _ { i } = \exp ( {\pmb { z }} _ { i } )\)\(\begin{array} { r } { ( \log ( \pmb { z } ) ) _ { i } = \log ( {\pmb { z }} _ { i } ) \ f ( \pmb { z } ) = { \frac { \exp ( \pmb { z } ) } { \mathbf { 1 } ^ { T } \exp ( \pmb { z } ) } } } \end{array}\) 称为 softmax 函数,如果\(\pmb { q } = f ( \pmb { z } ) , J = - \pmb { p } ^ { T } l o g ( \pmb { q } )\) ,其中 \(\pmb { p } , \pmb { q } , \pmb { z } \in \mathbb { R } ^ { n }\) ,并且 \(\mathbf { 1 } ^ { \mathrm { T } } p = 1\)

• 证: \(\begin{array} { r } { \frac { \partial J } { \partial {\pmb { z }} } = \pmb q - \pmb { p } } \end{array}\)

• 若 \(\pmb { z } = \pmb {W } \pmb { x }\) ,其中 \(\begin{array} { r } { \pmb { W } \in \mathbb { R } ^ { n \times m } , \pmb { x } \in \mathbb { R } ^ { m } , \frac { \partial \pmb { J } } { \partial \pmb { W } } = ( \pmb { q } - \pmb { p } ) \pmb { x } ^ { T } } \end{array}\) 是否成立。

解.

\[ \begin{array}{l} J = - \boldsymbol {p} ^ {T} \log (\frac {\exp (z)}{\mathbf {1} ^ {T} \exp (z)}) \\ = - \boldsymbol {p} ^ {T} \boldsymbol {z} + \boldsymbol {p} ^ {T} \log \left(\boldsymbol {1} ^ {T} \exp (\boldsymbol {z})\right) \boldsymbol {1} \\ = - \boldsymbol {p} ^ {T} \boldsymbol {z} + \boldsymbol {p} ^ {T} \mathbf {1} \log \left(\mathbf {1} ^ {T} \exp (z)\right) \\ = - \boldsymbol {p} ^ {T} \boldsymbol {z} + \log \left(\boldsymbol {1} ^ {T} \exp (\boldsymbol {z})\right) \\ \end{array} \]
\[ \begin{array}{l} \frac {\partial J}{\partial \boldsymbol {z}} = - \boldsymbol {p} + \frac {\partial \log (\mathbf {1} ^ {T} \exp (z))}{\partial \boldsymbol {z}} \\ = - \boldsymbol {p} + \frac {\partial \mathbf {1} ^ {T} \exp (\boldsymbol {z})}{\partial \boldsymbol {z}} \frac {1}{\mathbf {1} ^ {T} \exp (\boldsymbol {z})} \\ = - \boldsymbol {p} + \frac {\exp (z)}{\mathbf {1} ^ {T} \exp (z)} \\ = - p + q \\ \end{array} \]
\[ d J = d \operatorname {T r} (J) = \operatorname {T r} (d J) = \operatorname {T r} [ (- \boldsymbol {p} + \boldsymbol {q}) ^ {T} d \boldsymbol {W x} ] = \operatorname {T r} [ \boldsymbol {x} (- \boldsymbol {p} + \boldsymbol {q}) ^ {T} d \boldsymbol {W} ] \]
\[ \frac {\partial J}{\partial \boldsymbol {W}} = (- \boldsymbol {p} + \boldsymbol {q}) \boldsymbol {x} ^ {T} \]