作业7
提交截至时间:2022/11/25 周五 12:00(中午)
理论部分
习题1. 构建模型使得预测值与真实值的误差最小常用向量2-范数度量,求解模型过程中需要计算梯度,求梯度:
• \(\begin{array} { r } { f ( A ) = \frac { 1 } { 2 } \| A x + b - y \| _ { 2 } ^ { 2 } } \end{array}\) ,求 \(\frac { \partial f } { \partial A }\)
• \(\begin{array} { r } { f ( x ) = \frac { 1 } { 2 } \| A x + b - y \| _ { 2 } ^ { 2 } } \end{array}\) ,求 \(\textstyle { \frac { \partial f } { \partial x } }\)
其中 \(A \in R ^ { m \times n }\) , \(x \in R ^ { n }\) ,\(b,y \in R ^ { m }\)
解.
\[
\begin{array}{l} \frac {\partial}{\partial A} f = \frac {\partial}{\partial A} \frac {1}{2} \left(x ^ {T} A ^ {T} A x + 2 (b - y) ^ {T} A x + (b - y) ^ {T} (b - y)\right) \\ = \frac {\partial}{\partial A} \frac {1}{2} \left(x ^ {T} A ^ {T} A x + 2 (b - y) ^ {T} A x\right) \\ = A x x ^ {T} + (b - y) x ^ {T} \\ \frac {\partial}{\partial x} f = A ^ {T} A x + A ^ {T} (b - y) \\ \end{array}
\]
习题2. 二次型是数据分析中常用函数,求 \(\frac { \partial x ^ { T } A x } { \partial x }\) , \(\frac { \partial x ^ { T } A x } { \partial A }\) ,其中 \(A \in R ^ { m \times m }\) , \(x \in R ^ { m }\)
解.
$$
\frac { \partial x ^ { T } A x } { \partial x } = ( A + A ^ { T } ) x
$$
\[
\frac {\partial x ^ {T} A x}{\partial A} _ {i j} = x _ {i} x _ {j}, \frac {\partial x ^ {T} A x}{\partial A} = x x ^ {T}
\]
习题3. 利用迹微分法求解 \(\frac { \partial T r ( W ^ { - 1 } ) } { \partial W }\) ,其中 \(W \in R ^ { m \times m }\)
解. 因为
\[
0 = d I = d \left(W W ^ {- 1}\right) = d W W ^ {- 1} + W d W ^ {- 1}
\]
\[
W d W ^ {- 1} = - d W W ^ {- 1}
\]
\[
d W ^ {- 1} = - W ^ {- 1} d W W ^ {- 1}
\]
所以
\[
\begin{array}{l} d T r (W ^ {- 1}) = T r (d W ^ {- 1}) \\ = T r \left(- W ^ {- 1} d W W ^ {- 1}\right) \\ = T r \left(- \left(W ^ {- 1}\right) ^ {2} d W\right) \\ \end{array}
\]
即
\[
\frac {\partial T r (W ^ {- 1})}{\partial W} = - (W ^ {- T}) ^ {2}
\]
习题 4. \(( \exp ( \pmb { z }) ) _ { i } = \exp ( {\pmb { z }} _ { i } )\) , \(\begin{array} { r } { ( \log ( \pmb { z } ) ) _ { i } = \log ( {\pmb { z }} _ { i } ) \ f ( \pmb { z } ) = { \frac { \exp ( \pmb { z } ) } { \mathbf { 1 } ^ { T } \exp ( \pmb { z } ) } } } \end{array}\) 称为 softmax 函数,如果\(\pmb { q } = f ( \pmb { z } ) , J = - \pmb { p } ^ { T } l o g ( \pmb { q } )\) ,其中 \(\pmb { p } , \pmb { q } , \pmb { z } \in \mathbb { R } ^ { n }\) ,并且 \(\mathbf { 1 } ^ { \mathrm { T } } p = 1\)
• 证: \(\begin{array} { r } { \frac { \partial J } { \partial {\pmb { z }} } = \pmb q - \pmb { p } } \end{array}\)
• 若 \(\pmb { z } = \pmb {W } \pmb { x }\) ,其中 \(\begin{array} { r } { \pmb { W } \in \mathbb { R } ^ { n \times m } , \pmb { x } \in \mathbb { R } ^ { m } , \frac { \partial \pmb { J } } { \partial \pmb { W } } = ( \pmb { q } - \pmb { p } ) \pmb { x } ^ { T } } \end{array}\) 是否成立。
解.
\[
\begin{array}{l} J = - \boldsymbol {p} ^ {T} \log (\frac {\exp (z)}{\mathbf {1} ^ {T} \exp (z)}) \\ = - \boldsymbol {p} ^ {T} \boldsymbol {z} + \boldsymbol {p} ^ {T} \log \left(\boldsymbol {1} ^ {T} \exp (\boldsymbol {z})\right) \boldsymbol {1} \\ = - \boldsymbol {p} ^ {T} \boldsymbol {z} + \boldsymbol {p} ^ {T} \mathbf {1} \log \left(\mathbf {1} ^ {T} \exp (z)\right) \\ = - \boldsymbol {p} ^ {T} \boldsymbol {z} + \log \left(\boldsymbol {1} ^ {T} \exp (\boldsymbol {z})\right) \\ \end{array}
\]
\[
\begin{array}{l} \frac {\partial J}{\partial \boldsymbol {z}} = - \boldsymbol {p} + \frac {\partial \log (\mathbf {1} ^ {T} \exp (z))}{\partial \boldsymbol {z}} \\ = - \boldsymbol {p} + \frac {\partial \mathbf {1} ^ {T} \exp (\boldsymbol {z})}{\partial \boldsymbol {z}} \frac {1}{\mathbf {1} ^ {T} \exp (\boldsymbol {z})} \\ = - \boldsymbol {p} + \frac {\exp (z)}{\mathbf {1} ^ {T} \exp (z)} \\ = - p + q \\ \end{array}
\]
\[
d J = d \operatorname {T r} (J) = \operatorname {T r} (d J) = \operatorname {T r} [ (- \boldsymbol {p} + \boldsymbol {q}) ^ {T} d \boldsymbol {W x} ] = \operatorname {T r} [ \boldsymbol {x} (- \boldsymbol {p} + \boldsymbol {q}) ^ {T} d \boldsymbol {W} ]
\]
\[
\frac {\partial J}{\partial \boldsymbol {W}} = (- \boldsymbol {p} + \boldsymbol {q}) \boldsymbol {x} ^ {T}
\]
习题 5. 以下内容是利用极大似然估计求解多元正态分布模型的关键步骤: \(\begin{array} { r } { L = - \frac { N d } { 2 } l n ( 2 \pi ) - } \end{array}\) \(\begin{array} { r } { \frac { N } { 2 } l n | { \boldsymbol { \Sigma } } | - \frac { 1 } { 2 } \sum _ { t } ( { \pmb x } _ { t } - { \pmb \mu } ) ^ { T } { \pmb \Sigma } ^ { - 1 } ( { \pmb x } _ { t } - { \pmb \mu } ) } \end{array}\) , \(L\) 是对数似然, \(N\) 为样本数, \(d\) 为样本维数, \(\Sigma \in \mathbb { R } ^ { d \times d }\) 为协方差矩阵, \(\mu \in \mathbb { R } ^ { d }\) 为期望向量。
1) 求 \(\frac { \partial L } { \partial \mu }\)
2) 当 \(\begin{array} { r } { \mu = { \frac { 1 } { N } } \sum _ { t } x _ { t } } \end{array}\) 时,求 \(\frac { \partial L } { \partial \pmb { \Sigma } }\) ,并求使 \(\begin{array} { r } { \frac { \partial L } { \partial \pmb { \Sigma } } = \mathbf { 0 } } \end{array}\) 成立的 \(\pmb { \Sigma }\)
解. 1. \(\begin{array} { r } { \frac { \partial \cal { L } } { \partial \pmb { \mu } } = \sum _ { t } \pmb { \Sigma } ^ { - 1 } ( \pmb { x } _ { t } - \pmb { \mu } ) } \end{array}\)
-
\[
d \boldsymbol {L} = d \left[ \frac {N}{2} l n | \boldsymbol {\Sigma} | \right] - d \left[ \frac {1}{2} \sum_ {t} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) ^ {T} \boldsymbol {\Sigma} ^ {- 1} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) \right]
\]
第一项为
\[
d \left[ \frac {N}{2} \ln | \boldsymbol {\Sigma} | \right] = - \frac {N}{2} d [ \ln | \boldsymbol {\Sigma} | ] = - \frac {N}{2} \operatorname {T r} [ \boldsymbol {\Sigma} ^ {- 1} d \boldsymbol {\Sigma} ]
\]
第二项为
\[
\begin{array}{l} d \left[ \frac {1}{2} \sum_ {t} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) ^ {T} \boldsymbol {\Sigma} ^ {- 1} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) \right] = - \frac {1}{2} d \operatorname {T r} \left[ \sum_ {t} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) ^ {T} \boldsymbol {\Sigma} ^ {- 1} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) \right] \\ = - \frac {1}{2} d \operatorname {T r} \left[ \sum_ {t} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) ^ {T} \boldsymbol {\Sigma} ^ {- 1} \right] \\ = - \frac {1}{2} \operatorname {T r} \left[ \sum_ {t} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) ^ {T} \left(- \boldsymbol {\Sigma} ^ {- 1} (d \boldsymbol {\Sigma}) \boldsymbol {\Sigma} ^ {- 1}\right) \right] \\ = \frac {1}{2} \operatorname {T r} \left[ \boldsymbol {\Sigma} ^ {- 1} \sum_ {t} \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) \left(\boldsymbol {x} _ {t} - \boldsymbol {\mu}\right) ^ {T} \boldsymbol {\Sigma} ^ {- 1} d \boldsymbol {\Sigma} \right] \\ \end{array}
\]
得到
\[
\frac {\partial L}{\partial \boldsymbol {\Sigma}} = - \frac {N}{2} \boldsymbol {\Sigma} ^ {- 1} + \frac {1}{2} \boldsymbol {\Sigma} ^ {- 1} \sum_ {t} (\boldsymbol {x} _ {t} - \boldsymbol {\mu}) (\boldsymbol {x} _ {t} - \boldsymbol {\mu}) ^ {T} \boldsymbol {\Sigma} ^ {- 1}
\]
令 \(\begin{array} { r } { \frac { \partial L } { \partial \pmb { \Sigma } } = \mathbf { 0 } , } \end{array}\) , 易得 \(\begin{array} { r } { \Sigma = \frac { 1 } { N } \sum _ { t } ( { \boldsymbol x } _ { t } - { \boldsymbol \mu } ) ( { \boldsymbol x } _ { t } - { \boldsymbol \mu } ) ^ { T } } \end{array}\)
习题 6. 求 \(\frac {\partial | \pmb {X} ^ {k} |}{\partial \pmb {X}}\) ,其中 \(\pmb {X} \in \mathbb { R } ^ { m \times m }\) 为可逆矩阵。
解.
\[
\frac {\partial | \pmb {X} ^ {k} |}{\partial \pmb {X}} = \frac {\partial | \pmb {X} ^ {k} |}{\partial | \pmb {X} |} \frac {\partial | \pmb {X} |}{\partial \pmb {X}} = k | \pmb {X} | ^ {k - 1} | \pmb {X} | \pmb {X} ^ {- T} = k | \pmb {X} | ^ {k} \pmb {X} ^ {- T}
\]
习题 7. 求 \(\frac { \partial \operatorname { T r } ( A X B X ^ { T } C ) } { \partial X }\) ,其中 \(\pmb { A } \in \mathbb { R } ^ { m \times n } , \pmb { X } \in \mathbb { R } ^ { n \times k } , \pmb { B } \in \mathbb { R } ^ { k \times k } , \pmb { C } \in \mathbb { R } ^ { n \times m }\)
解.
\[
\frac {\partial \operatorname {T r} (\boldsymbol {A X B X} ^ {T} \boldsymbol {C})}{\partial \boldsymbol {X}} = (\boldsymbol {B X} ^ {T} \boldsymbol {C A}) ^ {\mathrm {T}} + \boldsymbol {C A X B}
\]