1
Among all probability distributions over [a,b]∈R , which distribution has the highest variance? How large is
that variance?
P(x)={210x=a,botherwise
then
Var(x)=E((x−2a+b)2)=(2a+b)2
2
Let X,Y be two random variables that follow some joint distributions over X×R . Let f:X→R be a real-valued function. Prove that
E[(Y−f(X))2]−E[(f(X)−E[Y∣X])2]=E[(Y−E[Y∣X])2].
Proof
It suffies to prove
E[(Y−f(X))2−(f(X)−E[Y∣X])2−(Y−E[Y∣X])2]=0i.e.E[(E[Y∣X]−Y)(E[Y∣X]−f(X))]=0.
Given E[Y∣X] is a function of X , let g(X):=E[Y∣X]−f(X) then it suffies to prove
E[E[Y∣X]g(X)]=E[Yg(X)].
then
LHS=xi∑E[Y∣X=xi]g(xi)PX(xi)=xi∑g(xi)PX(xi)yi∑yiPX(xi)PX,Y(xi,yi)=xi∑yi∑g(xi)yiPX,Y(xi,yi)=RHS■
Notes
Let f:X→R be a estimator from X to Y , this equation shows that square error ( l2 loss) E[(Y−f(X))2] is at least E[(Y−E[Y∣X])2] for ∀f and thus cannot be arbitrarily small.
3
Let A∈Rn×n be a positive-definite real symmetric matrix, and b∈Rn be a vector. λ is the largest eigenvalue of A , that is,
$ λ=maxz:∥z∥2=1∥Az∥2.(1) $
Let x⋆ be the solution to x⋆=Ax⋆+b . Define x0=0 and for t>0 , xt:=Axt−1+b . Prove that ∥xt−x⋆∥2≤λt∥x⋆∥2 .
(Hint: show that ∥xt−x⋆∥2≤λ∥xt−1−x⋆∥2 ). Also, you do not need to know any additional properties about the largest eigenvalue of matrix; the proof is elementary given Eq. (1).)
Proof
substitude
b=x⋆−Ax⋆,
then
xt=Axt−1+b=Axt−1+x⋆−Ax⋆,
and it suffies to prove
∥xt−x⋆∥2≤λ∥xt−1−x⋆∥2i.e.∥Axt−1−Ax⋆∥2≤λ∥xt−1−x⋆∥2
With Equation (1),
∥A(xt−1−x⋆)∥2≤λ∥xt−1−x⋆∥2■
4
Prove that γ1−γlog(1/ϵ)≤ϵ when γ,ϵ∈(0,1) .
(Hint: use the fact that (1−1/x)x<1/e when x>1 )
Proof
Lemma
(1−1/x)x<1/ewhenx>1
It suffies to prove
xlog(1−x1)<−1.
Substitude u:=1−1/x , then
log(u)<u−1
holds.
For original proposition, substitude u:=1−γ1 and therefore γ=1−u1 ,
then It suffies to prove
(1−u1)ulog(1/ϵ)≤ϵ
with the lemma,
(1−u1)ulog(1/ϵ)<(e1)log(1/ϵ)=ϵ■