Reinforcement Learning Homework (0)

1

Among all probability distributions over [a,b]R[a, b] \in \mathbb{R} , which distribution has the highest variance? How large is
that variance?

P(x)={12x=a,b0otherwiseP(x) = \begin{cases} \frac{1}{2} & x = a, b \\ 0 & \text{otherwise} \end{cases}

then

Var(x)=E((xa+b2)2)=(a+b2)2\operatorname{Var}(x) = \mathbb{E}\left(\left(x-\frac{a+b}{2}\right)^2\right) = \left(\frac{a+b}{2}\right)^2

2

Let X,YX, Y be two random variables that follow some joint distributions over X×R\mathcal{X} \times \mathbb{R} . Let f:XRf: \mathcal{X} \rightarrow \mathbb{R} be a real-valued function. Prove that

E[(Yf(X))2]E[(f(X)E[YX])2]=E[(YE[YX])2].\mathbb{E}\left[(Y-f(X))^2\right]-\mathbb{E}\left[(f(X)-\mathbb{E}[Y \mid X])^2\right]=\mathbb{E}\left[(Y-\mathbb{E}[Y \mid X])^2\right] .

Proof

It suffies to prove

E[(Yf(X))2(f(X)E[YX])2(YE[YX])2]=0i.e.E[(E[YX]Y)(E[YX]f(X))]=0.\begin{gathered} \mathbb{E}\left[(Y-f(X))^2 - (f(X)-\mathbb{E}[Y \mid X])^2 - (Y-\mathbb{E}[Y \mid X])^2\right] = 0 \\ \text{i.e.}\quad \mathbb{E}\left[\left(E\left[Y|X\right]-Y\right)\left(E\left[Y| X\right]-f\left(X\right)\right)\right]=0 . \end{gathered}

Given E[YX]E[Y| X] is a function of XX , let g(X):=E[YX]f(X)g(X) := E[Y| X] - f(X) then it suffies to prove

E[E[YX]g(X)]=E[Yg(X)].\mathbb{E}[\mathbb{E}[Y|X]g(X)] = \mathbb{E}[Y g(X)] .

then

LHS=xiE[YX=xi]g(xi)PX(xi)=xig(xi)PX(xi)yiyiPX,Y(xi,yi)PX(xi)=xiyig(xi)yiPX,Y(xi,yi)=RHS  \begin{aligned} \text{LHS} &= \sum_{x_i} \boxed{\mathbb{E}[Y|X=x_i]} g(x_i) P_X(x_i) \\ &= \sum_{x_i} g(x_i) P_X(x_i) \boxed{\sum_{y_i}y_i\frac{P_{X,Y}(x_i,y_i)}{P_X(x_i)} } \\ &= \sum_{x_i} \sum_{y_i} g(x_i) y_i P_{X,Y}(x_i,y_i) \\ &= \text{RHS} \;\blacksquare \end{aligned}

Notes

Let f:XRf: \mathcal{X} \rightarrow \mathbb{R} be a estimator from XX to YY , this equation shows that square error ( l2l_2 loss) E[(Yf(X))2]\mathbb{E}\left[(Y-f(X))^2\right] is at least E[(YE[YX])2]\mathbb{E}\left[(Y-\mathbb{E}[Y \mid X])^2\right] for f\forall f and thus cannot be arbitrarily small.

3

Let ARn×nA \in \mathbb{R}^{n \times n} be a positive-definite real symmetric matrix, and bRnb \in \mathbb{R}^n be a vector. λ\lambda is the largest eigenvalue of AA , that is,

$ λ=maxz:z2=1Az2.(1)\lambda = \max_{z:\| z\| _2=1} \| Az\| _2. \quad (1) $

Let xx^\star be the solution to x=Ax+bx^\star = Ax^\star + b . Define x0=0x_0 = 0 and for t>0t > 0 , xt:=Axt1+bx_t := Ax_{t-1} + b . Prove that xtx2λtx2\| x_t - x^\star\|_2 \leq \lambda^t \| x^\star\|_2 .

(Hint: show that xtx2λxt1x2\| x_t - x^\star\|_2 \leq \lambda \| x_{t-1} - x^\star\|_2 ). Also, you do not need to know any additional properties about the largest eigenvalue of matrix; the proof is elementary given Eq. (1).)

Proof

substitude

b=xAx,b = x^\star - Ax^\star,

then

xt=Axt1+b=Axt1+xAx,x_t = Ax_{t-1} + b = Ax_{t-1} + x^\star - Ax^\star,

and it suffies to prove

xtx2λxt1x2i.e.Axt1Ax2λxt1x2\begin{gathered} \| x_t - x^\star \| _2 \leq \lambda \| x_{t-1} - x^\star \| _2 \\ \text{i.e.}\quad \| Ax_{t-1} - Ax^\star\| _2 \le \lambda \| x_{t-1} - x^\star \| _2 \\ \end{gathered}

With Equation (1),

A(xt1x)2λxt1x2   \| A(x_{t-1} - x^\star)\| _2 \le \lambda \| x_{t-1} - x^\star \| _2 \;\blacksquare

4

Prove that γlog(1/ϵ)1γϵ\gamma^{\frac{\log(1/\epsilon)}{1-\gamma}} \le \epsilon when γ,ϵ(0,1)\gamma, \epsilon \in (0, 1) .

(Hint: use the fact that (11/x)x<1/e(1-1/x)^x< 1/e when x>1x > 1 )

Proof

Lemma

(11/x)x<1/ewhenx>1(1-1/x)^x< 1/e \quad\text{when}\quad x > 1

It suffies to prove

xlog(11x)<1.x\log\left(1-\frac{1}{x}\right) < -1.

Substitude u:=11/xu := 1-1/x , then

log(u)<u1\log(u) < u-1

holds.

For original proposition, substitude u:=11γu := \frac{1}{1-\gamma} and therefore γ=11u\gamma = 1 - \frac{1}{u} ,
then It suffies to prove

(11u)ulog(1/ϵ)ϵ\left( 1 - \frac{1}{u} \right)^{u \log(1/\epsilon)} \le \epsilon

with the lemma,

(11u)ulog(1/ϵ)<(1e)log(1/ϵ)=ϵ  \left( 1 - \frac{1}{u} \right)^{u \log(1/\epsilon)} < \left( \frac{1}{e} \right)^{\log (1/\epsilon)} = \epsilon \;\blacksquare


Reinforcement Learning Homework (0)
https://yzzzf.xyz/2024/02/03/reinforcement-learning-homework-0/
Author
Zifan Ying
Posted on
February 3, 2024
Licensed under