第 6 次作业
理论部分
习题 1
(互信息) 假设 \(X _ { 1 } \to X _ { 2 } \to X _ { 3 } \to \cdot \cdot \cdot \to X _ { n }\) 是一个马尔科夫链,即
\[
p \left(x _ {1}, x _ {2}, \dots , x _ {n}\right) = p \left(x _ {1}\right) p \left(x _ {2} \mid x _ {1}\right) \dots p \left(x _ {n} \mid x _ {n - 1}\right)
\]
试化简 \(I \left( X _ { 1 } ; X _ { 2 } , \ldots , X _ { n } \right)\)
解.
\[
\begin{array}{l} I \left(X _ {1}; X _ {2}, \dots , X _ {n}\right) = H \left(X _ {1}\right) - H \left(X _ {1} \mid X _ {2}, \dots , X _ {n}\right) \\ = H \left(X _ {1}\right) - \left[ H \left(X _ {1}, X _ {2}, \dots , X _ {n}\right) - H \left(X _ {2}, \dots , X _ {n}\right) \right] \\ = H \left(X _ {1}\right) - \left[ \sum_ {i = 1} ^ {n} H \left(X _ {i} \mid X _ {i - 1}, \dots , X _ {1}\right) - \sum_ {i = 2} ^ {n} H \left(X _ {i} \mid X _ {i - 1}, \dots , X _ {2}\right) \right] \\ = H \left(X _ {1}\right) - \left[ \left(H \left(X _ {1}\right) + \sum_ {i = 2} ^ {n} H \left(X _ {i} \mid X _ {i - 1}\right)\right) - \left(H \left(X _ {2}\right) + \sum_ {i = 3} ^ {n} H \left(X _ {i} \mid X _ {i - 1}\right)\right) \right] \\ = H \left(X _ {2}\right) - H \left(X _ {2} \mid X _ {1}\right) \\ = I \left(X _ {2}; X _ {1}\right) \\ = I \left(X _ {1}; X _ {2}\right) \\ \end{array}
\]
习题 2
(通过 \(K L\) 散度理解 MLE) 假设 \(\mathbf { x } _ { 1 } , \ldots , \mathbf { x } _ { n }\) 来自密度为 \(p ( \mathbf { x } )\) 的分布 \(P\) ,试说明如果采用具有密度函数 \(q _ { \theta } ( \mathbf { x } )\) 的分布族 \(Q _ { \theta }\) 来计算 \(M L E\) ,那么 MLE 将试图找到在 \(K L\) 散度意义上最接近真实分布 \(P\) 的分布 \(Q _ { \theta }\) 。
即证明
\[
\arg \max _ {\theta} \prod_ {i = 1} ^ {n} q _ {\theta} \left(\mathbf {x} _ {i}\right) \Longleftrightarrow \arg \min _ {\theta} D _ {\mathrm {KL}} (P \| Q _ {\theta})
\]
解.
\[
\begin{array}{l} \arg \max _ {\theta} \prod_ {i = 1} ^ {n} q _ {\theta} \left(\mathbf {x} _ {i}\right) \Longleftrightarrow \arg \min _ {\theta} - \frac {1}{n} \sum_ {i = 1} ^ {n} \log q _ {\theta} \left(\mathbf {x} _ {i}\right) \\ \xrightarrow {P} \arg \min _ {\theta} - E _ {P} \log q _ {\theta} (\mathbf {x}) \Longleftrightarrow \arg \min _ {\theta} - \int p (\mathbf {x}) \log q _ {\theta} (\mathbf {x}) d \mathbf {x} \\ \Longleftrightarrow \arg \min _ {\theta} H (P, Q _ {\theta}) \Longleftrightarrow \arg \min _ {\theta} (H (P, Q _ {\theta}) - H (P)) \\ \Longleftrightarrow \arg \min _ {\theta} \left\{- \int p (\mathbf {x}) \log q _ {\theta} (\mathbf {x}) d \mathbf {x} + \int p (\mathbf {x}) \log p (\mathbf {x}) d \mathbf {x} \right\} \\ \Longleftrightarrow \arg \min _ {\theta} \left\{- \int p (\mathbf {x}) \log \frac {q _ {\theta} (\mathbf {x})}{p (\mathbf {x})} \right\} \Longleftrightarrow \arg \min _ {\theta} D _ {\mathrm {KL}} (P \| Q _ {\theta}) \\ \end{array}
\]
其实,从优化模型参数角度来说,最小化负对数似然,交叉熵(多分类问题中),KL散度这3种方式是一样的。
习题 3
设某种电子器件的寿命(以 \(h\) 计) \(T\) 服从双参数的指数分布,其概率密度为
\[
f (t) = \left\{ \begin{array}{l l} { \frac {1}{\theta} e ^ {- (t - c) / \theta}} & {t \geq c} \\ {0} & {\text {其他}} \end{array} \right.
\]
其中 \(c , \theta ( c , \theta > 0 )\) 为未知参数.自一批这种器件中随机地取 \(n\) 件进行寿命试验.设它们的失效时间依次为 \(x _ { 1 } \leqslant x _ { 2 } \leqslant \cdots \leqslant x _ { n }\) 。
(1) 求 \(\theta\) 与 \(c\) 的最大似然估计值
(2) 求 \(\theta\) 与 \(c\) 的矩估计量
解. (1) 易知似然函数为
\[
L (\theta , c) = \frac {1}{\theta^ {n}} \exp \left\{\frac {n c - \sum_ {i = 1} ^ {n} x _ {i}}{\theta} \right\}.
\]
所以
\[
\ln L (\theta , c) = - n \ln \theta + \frac {n c - \sum_ {i = 1} ^ {n} x _ {i}}{\theta}.
\]
对 \(\theta\) 求偏导,并令导数为 0 , 可得 \(\begin{array} { r } { \frac { n } { \theta } + \frac { n c - \sum _ { i = 1 } ^ { n } x _ { i } } { \theta ^ { 2 } } = 0 } \end{array}\) . 可得 \(\begin{array} { r } { \theta = \frac { \sum _ { i = 1 } ^ { n } x _ { i } } { n } - c . } \end{array}\) . 对该函数求二阶导并将 \(\theta\) 代入, 可得
\[
\frac {n}{\theta^ {2}} + \frac {2 n c - 2 \sum_ {i = 1} ^ {n} x _ {i}}{\theta^ {3}} = - \frac {n}{\theta^ {2}} < 0,
\]
这说明求得的 \(\theta\) 确实是极大值点。因此, \(\theta\) 的最大似然估计值为 \(\begin{array} { r } { \theta = \frac { \sum _ { i = 1 } ^ { n } x _ { i } } { n } - c . } \end{array}\)
另外,在上式中, 通过简单地观测可以发现 \(\ln L ( \theta , c )\) 的数值会随着 \(c\) 的增加而增加, 故为使得上式最大, 应当使 \(c\) 最大. 故 \(c\) 的最大似然估计值为 \(c = x _ { ( 1 ) }\) .
(2) 该分布的期望和二阶矩分别为
\[
E (X) = \int_ {c} ^ {+ \infty} \frac {x}{\theta} \exp \left\{- \frac {x - c}{\theta} \right\} d x = \theta + c.
\]
\[
E \left(X ^ {2}\right) = \int_ {c} ^ {+ \infty} \frac {x ^ {2}}{\theta} \exp \left\{- \frac {x - c}{\theta} \right\} d x = c ^ {2} + 2 c \theta + 2 \theta^ {2}.
\]
该分布的方差 \(\operatorname { V a r } ( X ) = E \left( X ^ { 2 } \right) - E ( X ) ^ { 2 } = \theta ^ { 2 }\) . 通过联立方程组,可求得 \(\theta\) 的矩估计为 \(\theta =\) $ \sqrt{\frac{\sum _ { i = 1 } ^ { n } ( x _ { i } - { \bar { x } } )}{n}} $ , \(c\) 的矩估计为 \(c = \bar { x } - \sqrt { \frac { \sum _ { i = 1 } ^ { n } ( x _ { i } - \bar { x } ) } { n } }\) .
习题 4
设总体 \(X\) 的概率密度为
\[
f (x; \theta) = {\left\{ \begin{array}{l l} { {\frac {1}{\theta}} x ^ {(1 - \theta) / \theta}} & {0 < x < 1} \\ {0} & {{\text {其 他}}} \end{array} \right.} \quad 0 < \theta < + \infty
\]
\(X _ { 1 } , X _ { 2 } , \cdots , X _ { n }\) 是来自总体 \(X\) 的样本。
(1) 验证 \(\theta\) 的最大似然估计量是 \(\begin{array} { r } { \hat { \theta } = \frac { - 1 } { n } \sum _ { i = 1 } ^ { n } \ln X _ { i } } \end{array}\)
(2) 证明 \(\hat { \theta }\) 是 \(\theta\) 的无偏估计量。
解. (1) 易知似然函数为
\[
L (\theta) = \frac {1}{\theta^ {n}} \left(\prod_ {i = 1} ^ {n} x _ {i}\right) ^ {\frac {1 - \theta}{\theta}}.
\]
所以 \(\begin{array} { r } { \ln L ( \theta ) = - n \ln \theta + \frac { 1 - \theta } { \theta } \ln \left( \prod _ { i = 1 } ^ { n } x _ { i } \right) } \end{array}\) . 对 \(\theta\) 求导,并令导数为 0 , 可得
\[
- \frac {n}{\theta} - \frac {1}{\theta^ {2}} \ln \left(\prod_ {i = 1} ^ {n} x _ {i}\right) = 0.
\]
可得 \(\begin{array} { r } { \hat { \theta } = - \frac { 1 } { n } \Sigma _ { i = 1 } ^ { n } \ln X _ { i } } \end{array}\) . 对该函数求二阶导并将 \(\hat { \theta }\) 代入, 可得 \(\begin{array} { r } { - \frac { n ^ { 3 } } { \left( \sum _ { i = 1 } ^ { n } \ln X _ { i } \right) ^ { 2 } } < 0 } \end{array}\) ,这说明求得的 \(\theta\) 确实是极大值点。故原命题得证。
(2) 首先求得
\[
E (\ln x) = \int_ {0} ^ {1} \ln x \frac {1}{\theta} x ^ {\frac {1 - \theta}{\theta}} d x = - \theta .
\]
所以 \(\begin{array} { r } { E ( \hat { \theta } ) = - \frac { - n \theta } { n } = \theta } \end{array}\) . 故原命题得证。
习题 5
假设总体 \(X \sim \mathit { N } \left( \mu , \sigma ^ { 2 } \right)\) ( \(\sigma ^ { 2 }\) 已知 ), \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) 为来自总体 \(X\) 的样本, 由过去的经验和知识, 我们可以确定 \(\mu\) 的取值比较集中在 \(\mu _ { 0 }\) 附近, 离 \(\mu _ { 0 }\) 越远, \(\mu\) 取值的可能性越小, 于是我们假定 \(\mu\) 的先验分布为正态分布
\[
\pi (\mu) = \frac {1}{\sqrt {2 \pi \sigma_ {\mu} ^ {2}}} \exp \left[ - \frac {1}{2 \sigma_ {\mu} ^ {2}} \left(\mu - \mu_ {0}\right) ^ {2} \right] \quad \left(\mu_ {0}, \sigma_ {\mu} \text {已 知}\right)
\]
求 \(\mu\) 的后验概率分布。
解. 样本分布密度为
\[
q (\mathbf {x} \mid \mu) = \frac {1}{\sigma^ {n} (2 \pi) ^ {n / 2}} \exp \left[ - \frac {1}{2 \sigma^ {2}} \sum_ {i = 1} ^ {n} \left(x _ {i} - \mu\right) ^ {2} \right]
\]
于是后验密度函数为
\[
\begin{array}{l} h (\mu \mid \mathbf {x}) = \frac {g (\mathbf {x} \mid \mu) \cdot \pi (\mu)}{f _ {\mathbf {x}} (\mathbf {x})} = \frac {q (\mathbf {x} \mid \mu) \cdot \pi (\mu)}{\int_ {- \infty} ^ {+ \infty} q (\mathbf {x} \mid \mu) \cdot \pi (\mu) d \mu} \\ \propto \exp \left[ - \frac {1}{2 \sigma^ {2}} \sum_ {i = 1} ^ {n} \left(x _ {i} - \mu\right) ^ {2} \right] \cdot \exp \left[ - \frac {1}{2 \sigma_ {\mu} ^ {2}} \left(\mu - \mu_ {0}\right) ^ {2} \right] \\ \end{array}
\]
化简得
\[
h (\mu \mid \mathbf {x}) \propto \exp \left[ - \frac {(\mu - t) ^ {2}}{2 \eta^ {2}} \right]
\]
其中 \(\begin{array} { r } { t = \frac { \frac { n } { \sigma ^ { 2 } } \bar { x } + \frac { 1 } { \sigma _ { \mu } ^ { 2 } } \mu _ { 0 } } { \frac { n } { \sigma ^ { 2 } } + \frac { 1 } { \sigma _ { \mu } ^ { 2 } } } } \end{array}\) , \(\begin{array} { r } { \eta ^ { 2 } = \frac { 1 } { \frac { n } { \sigma ^ { 2 } } + \frac { 1 } { \sigma _ { \mu } ^ { 2 } } } } \end{array}\) , 于是
\[
\mu \mid \mathbf {x} \sim N \left(\frac {\frac {n}{\sigma^ {2}} \bar {x} + \frac {1}{\sigma_ {\mu} ^ {2}} \mu_ {0}}{\frac {n}{\sigma^ {2}} + \frac {1}{\sigma_ {\mu} ^ {2}}}, \frac {1}{\frac {n}{\sigma^ {2}} + \frac {1}{\sigma_ {\mu} ^ {2}}}\right).
\]
习题 6
假设总体 \(X \sim P ( \lambda ) , X _ { 1 } , X _ { 2 } , . . . , X _ { n }\) 为来自总体 \(X\) 的样本, 假定 \(\lambda\) 的先验分布为伽玛分布 \(\Gamma ( \alpha , \beta )\) , 求 \(\lambda\) 的后验期望估计(平方损失下的贝叶斯估计)。
解. 因为 \(\lambda\) 的先验密度函数 \(\pi ( \lambda )\) 为伽玛分布 \(\Gamma ( \alpha , \beta )\) , 即
\[
\pi (\lambda) \propto \lambda^ {\alpha - 1} e ^ {- \beta \lambda}
\]
分布密度函数为:
\[
q (\mathbf {x} \mid \lambda) = \frac {\lambda^ {\sum_ {i = 1} ^ {n} x _ {i}}}{x _ {1} ! x _ {2} ! \dots , x _ {n} !} e ^ {- n \lambda} \propto \lambda^ {\sum_ {i = 1} ^ {n} x _ {i}} e ^ {- n \lambda}
\]
所以
\[
h (\lambda \mid \mathbf {x}) \propto \lambda^ {\alpha + \sum_ {i = 1} ^ {n} x _ {i} - 1} e ^ {- (\beta + n) \lambda}
\]
即
\[
\lambda \mid \mathbf {x} \sim \Gamma \left(\alpha + \sum_ {i = 1} ^ {n} x _ {i}, \beta + n\right)
\]
故 \(\lambda\) 的后验期望估计为
\[
\hat {\lambda} = \frac {\alpha + \sum_ {i = 1} ^ {n} x _ {i}}{\beta + n} = \frac {n}{\beta + n} \bar {x} + \frac {\beta}{\beta + n} \frac {\alpha}{\beta}
\]
它是样本均值 \(\bar { x }\) 和先验分布 \(\Gamma ( \alpha , \beta )\) 的均值 \(\frac { \alpha } { \beta }\) 的加权平均。