ECE313 Course Notes

De morgan

AcBc=(AB)cA^c\cap B^c = (A\cup B)^c

probability space (Ω,F,P)(\Omega, \mathcal{F}, P)

Ω\Omega - sample space

F\mathcal{F} - collection of events

$P probabilitymeasureon- probability measure on \mathcal{F}$

power set denoted as 2Ω2^\Omega

2Ω=2Ω|2^\Omega| = 2^{|\Omega|}

$A, B aremutuallyexclusive:are mutually exclusive: A \cap B = \emptyset (not(not P(A\cap B)=0$)

  • example

    若样本空间是关于一个机会均等的抛硬币动作,则样本输出Ω为“正面”或“反面”。事件为:

    • {正面},其概率为0.5。
    • {反面},其概率为0.5。
    • { }=∅ 非正非反,其概率为0.
    • {正面,反面},不是正面就是反面,这是Ω,其概率为1。

Axioms

event axioms

  • ΩF\Omega \in \mathcal{F}
  • AFAcFA\in \mathcal{F} \Rightarrow A^c \in \mathcal{F}
  • A,BFABFA, B \in \mathcal{F} \Rightarrow A \cup B \in \mathcal{F}
  • can prove:
    • F\emptyset \in \mathcal{F}
    • A,BFABFA,B \in \mathcal{F} \Rightarrow AB \in \mathcal{F}

probability axioms

  • P(A)0AFP(A) \ge 0\quad \forall A \in \mathcal{F}
  • AB=P(AB)=P(A)+P(B)AB = \emptyset \Rightarrow P(A\cup B) = P(A) + P(B)
  • P(Ω)=1P(\Omega)=1
  • can prove:
    • P(Ac)=1P(A)P(A^c) = 1 - P(A)
    • P(A)=P(AB)+P(ABc)P(A) = P(AB) + P(AB^c)
    • P(AB)=P(A)+P(B)P(AB)P(A\cup B) = P(A) + P(B) - P(AB)

Principle of counting: multiply


Counting

permutation - factorials

binomial coefficient


K map

Random Variables

for a probability space (Ω,F,P)(\Omega, \mathcal{F}, P)

$r.v isarealvaluedfunctiononis a real-valued function on \Omega$

X:ΩRX: \Omega \to \mathbb{R}


Variance and Standard Deviation

Var(X):=E[(XE[X])2]σX=Var(X)Var(X) := E[(X-E[X])^2] \\ \sigma_X = \sqrt{Var(X)}

Properties of Variance

Var(aX+b)=a2Var(X)Var(aX+b) = a^2Var(X)

Var(X)=E[(XE[X])2]=E[X22XE[X]+E[X]2]=E[X2]2E[X]E[X]+E[X]2=E[X2]E[X]2\begin{aligned} Var(X) &= E[(X-E[X])^2] \\&=E[X^2-2XE[X]+E[X]^2] \\&=E[X^2] - 2E[X]E[X] + E[X]^2 \\&= E[X^2] - E[X]^2 \end{aligned}

e.g. 4 people poicking 4 card with name, expectation of # of people picking a card with their name.

Xi={1choosename0otherwiseX_i = \begin{cases}1 & choosename \\0&otherwise\end{cases}

then

X=i=14XiX = \sum_{i=1}^4X_i

E[X]=i=14E[Xi]=i=1414=1E[X] = \sum_{i=1}^4E[X_i] = \sum_{i=1}^4 \frac{1}{4}= 1

Variance:

because

Xi2=XiX_i^2 = X_i

therefore

E[Xi2]=14E[X_i^2] = \frac{1}{4}

XiXj={1both i,j choose their name0otherwiseX_iX_j = \begin{cases}1 & \text{both i,j choose their name} \\0&otherwise\end{cases}

and

E[XiXj]=1413=112E[X_iX_j] = \frac{1}{4} \frac{1}{3} = \frac{1}{12}

E[X2]=i=14E[Xi2]+ijE[XiXj]=414+12112=2\begin{aligned} E[X^2] = \sum_{i=1}^4E[X_i^2]+\sum _{i\ne j}E[X_iX_j] \\ = 4 \cdot \frac{1}{4} + 12 \cdot \frac{1}{12} = 2 \end{aligned}

Var(X)=212=1Var(X) = 2 - 1^2 = 1

Conditional Probability

Conditional Probability of B given A is defined as

P(BA)={P(AB)P(A)ifP(A)>0undefinedP(B|A) = \begin{cases} \frac{P(AB)}{P(A)} &\text{if} P(A) > 0 \\ \text{undefined}\end{cases}


Independent event

P(BA)=P(B)P(B|A) = P(B)

Independent variable

for any A,BRA,B \subset \mathbb{R} , XAX \in A and YBY \in B are independent events.

this implies PXY(x,y)=PX(x)PY(y)P_{XY}(x,y) = P_X(x)P_Y(y)

Properties of r.v

  • independent: E[XY]=E[X]E[Y]E[XY]=E[X]E[Y]
  • not necessarily independent: E[X+Y]=E[X]+E[Y]E[X+Y]=E[X]+E[Y]
  • independent: Var[X+Y]=Var[X]+Var[Y]Var[X+Y]=Var[X]+Var[Y]

Bernoulli Distribution

A r.v. XX with range X={0,1}\mathcal{X}= \{0,1\} ,

pX(1)=p,pX(0)=1pp_X(1) = p, \quad p_X(0) = 1-p

E[X]=p,Var(X)=p(1p)E[X] = p ,\quad Var(X) = p(1-p)

The Binomial Distribution

r.v. YY have Bernoulli Distribution,

$n independentexperiments,eachisindependent experiments, each is Y$.

XX be the r.v. giving the number of ones that occur.

pmf of XX :

pX(k)=(nk)pk(1p)nkp_X(k) = \binom{n}{k} p^k(1-p)^{n-k}

E[X]=np,Var(X)=np(1p)E[X] = np, \quad Var(X) = np(1-p)

Geometric Distribution

Sequence of independent experiments each having a Bernoulli distribution with pp , LL is number of experiments until the outcome is 1.

pL(k)=(1p)k1p for k1.P{L>k}=(1p)k for k0.\begin{aligned} p_L(k) & =(1-p)^{k-1} p \quad \text { for } \quad k \geq 1 . \\ P\{L>k\} & =(1-p)^k \quad \text { for } \quad k \geq 0 . \end{aligned}

E[L]=1p,Var(L)=1pp2E[L] = \frac{1}{p}, \quad Var(L) = \frac{1-p}{p^2}

Poisson Distribution

For each r.v. X1,X2,X_1, X_2, \cdots

Cj:=k=1jXkC_j := \sum_{k=1}^j X_k

Untitled

p(k)=λkeλk!,  λ=npp(k) = \frac{\lambda ^ke^{-\lambda}}{k!}, \; \lambda = np

E[Y]=λ,  Var(Y)=λE[Y]=\lambda,\;Var(Y)=\lambda

is λ\lambda expectation or pp ? (in Poisson Process, λ\lambda is pp )

Negative Binomial Distribution

number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes (denoted r) occurs.

f(k;r,p)Pr(X=k)=(k+r1r1)pr(1p)kf(k; r, p) \equiv \Pr(X = k) = \binom{k+r-1}{r-1} p^r(1-p)^k

E[k]=r1ppE[k] = r \frac{1-p}{p}

Var(k)=r1pp2Var(k) = r \frac{1-p}{p^2}

Maximum Likelihood Parameter Estimation

$f isthedistributionofis the distribution of x$

L:θf(xθ)θ^ML(x)=arg maxθf(xθ)L: \theta \mapsto f(x | \theta) \\ \hat{\theta}_{\mathrm{ML}}(x) = \argmax_{\theta} f(x | \theta)


Markov inequality

if YY is a nonnegative random variable, the for c>0c>0 ,

Pr{Yc}E[Y]cPr\{Y\ge c\} \le \frac{E[Y]}{c}

Proof.

E(X)=xf(x)dx=0xf(x)dxaxf(x)dxaaf(x)dx=aaf(x)dx=aP(Xa).\begin{aligned}\textrm{E}(X) &= \int_{-\infty}^{\infty}x f(x) dx \\&= \int_{0}^{\infty}x f(x) dx \\[6pt]&\geqslant \int_{a}^{\infty}x f(x) dx \\[6pt]&\geqslant \int_{a}^{\infty}a f(x) dx \\[6pt]&= a\int_{a}^{\infty} f(x) dx \\[6pt]&=a\textrm{P}(X\geqslant a).\end{aligned}

Chebyshev’s Inequality

Pr(XE(X)a)Var(X)a2\Pr(|X-\textrm{E}(X)| \geq a) \leq \frac{\textrm{Var}(X)}{a^2}

Proof.

Var(X)=E[(XE(X))2].\operatorname{Var}(X) = \operatorname{E}[(X - \operatorname{E}(X) )^2].

use Markov inequality,

Pr((XE(X))2a2)Var(X)a2,\Pr( (X - \operatorname{E}(X))^2 \ge a^2) \le \frac{\operatorname{Var}(X)}{a^2},

Confidence Interval

With r.v. p^=Xn\hat p = \frac{X}{n} (sampling) and μ=E[p^]=p\mu = E[\hat p] = p (real p)

then

Var(p^)=Var(X)n=p(1p)nVar(\hat p) = \frac{Var(X)}{n} = \frac{p(1-p)}{n}

and`

Pr{p^μ<σb}>11b2Pr{μ(p^σb,p^+σb)}>11b2Pr{p(p^bp(1p)n,p^+bp(1p)n)}>11b2\begin{array}{c}\operatorname{Pr}\left\{\left|\hat{p}^{}-\mu\right|<\sigma b\right\}>1-\frac{1}{b^{2}} \\\Rightarrow \operatorname{Pr}\{\mu \in(\hat{p}-\sigma b, \hat{p}+\sigma b)\}>1-\frac{1}{b^{2}} \\\Rightarrow \operatorname{Pr}\left\{p \in\left(\hat{p}-b \sqrt{\frac{p(1-p)}{n}}, \hat{p}+b \sqrt{\frac{p(1-p)}{n}}\right)\right\}>1-\frac{1}{b^{2}}\end{array}

Use bound p(1p)12\sqrt{p(1-p)} \le \frac{1}{2}  ,

Pr{p(p^b2n,p^+b2n)}>11b2\operatorname{Pr}\left\{p \in\left(\hat{p}-\frac{b}{2 \sqrt{n}}, \hat{p}+\frac{b}{2 \sqrt{n}}\right)\right\}>1-\frac{1}{b^{2}}

$1-\frac{1}{b^2} isconfidencelevel,is confidence level, \frac{b}{2 \sqrt{n}}$ is confidence width,


Conditional mean

E[XA]=iuiP(X=uiA)E[X|A] = \sum_i u_iP(X=u_i|A)

LOTUS (Law of the unconscious statistician)

E[g(X)A]=ig(ui)P(X=uiA)E[g(X)|A] = \sum_i g(u_i)P(X=u_i|A)

Total Probability Theroem

P(B)=iP(AiB)P(B)=iP(BAi)P(Ai)P(B) = \sum_i{P(A_iB)} \\P(B) = \sum_i{P(B|A_i)P(A_i)}

Response of Stochastic System

Bayes’ theorem

P(AB)=P(BA)P(A)P(B)=P(Ai)P(BAi)iP(BAi)P(Ai)P(A|B) = \frac{P(B|A)P(A)}{P(B)} =\frac{P(A_i)P(B|A_i)}{\sum_iP(B|A_i)P(A_i)}

MAP v.s ML

ML

L:θP(xθ)θ^=arg maxθP(xθ)L:\theta \mapsto P(x|\theta) \\ \hat \theta =\argmax_\theta P(x|\theta)

MAP

L:θP(θx)θ^=arg maxθP(θx)=arg maxθP(xθ)P(θ)θP(xθ)g(θ)L:\theta \mapsto P(\theta|x) \\ \hat \theta =\argmax_\theta P(\theta|x) = \argmax_\theta \frac{P(x|\theta)P(\theta)}{\sum_{\theta'}P(x|\theta')g(\theta ')}


Network Probability

edge i has a probability pip_i of failing

Untitled

Boole’s Inequality (Union Bound)

P(iAi)iP(Ai)P\left(\bigcup_{i} A_i\right) \le \sum_i P(A_i)


Binary hypothesis testing with discrete-type observations

Likelihood matrix

Likelihood matrix

Either hypothesis H0H_0 or H1H_1 is true. If H0H_0 is true, XX has a pmf p0p_0 , if H0H_0 is true, XX has a pmf p0p_0 , as showed in the Likelihood matrix.

Decision rule

Decision rule

As mentioned above, a decision rule specifies, for each possible observation, which hypothesis is declared.

  • false alarm: H0H_0 is true, H1H_1 is declared.
  • miss: H1H_1 is true, H0H_0 is declared.

pfalse alarm=P(declareH1trueH0)pmiss=P(declareH0trueH1)p_{\text{false alarm} } = P(\text{declare} H_1 \text{true} | H_0) \\ p_{\text{miss} } = P(\text{declare} H_0 \text{true} | H_1)

Maximum likelihood (ML) decision rule

ML decision rule

ML decision rule

another way: likelihood ratio test (LRT)

Define the likelihood ratio

Λ(k)=p1(k)p0(k)\Lambda(k) = \frac{p_1(k)}{p_0(k)}

Λ(X){>1 declare H1 is true <1 declare H0 is true. \Lambda(X) \begin{cases}>1 & \text { declare } H_1 \text { is true } \\ <1 & \text { declare } H_0 \text { is true. }\end{cases}

or

Λ(X){>τ declare H1 is true <τ declare H0 is true \Lambda(X) \begin{cases}>\tau & \text { declare } H_1 \text { is true } \\ <\tau & \text { declare } H_0 \text { is true }\end{cases}

ML decision rule is τ=1\tau = 1 .

  • increase τ\tau : miss
  • decrease τ\tau : false alarm

Maximum a posteriori probability (MAP) decision rule

make use of P(H0)P(H_0) and P(H0)P(H_0) prior knowledge and use joint probability such as P({X=1}H1)P(\{X=1\}\cap H_1) .

denote

P(H0)=πiP(H_0) = \pi_i

the joint probabilities are

P(Hi,X=k)=πipi(k)P(H_i, X=k) = \pi_ip_i(k)

joint probability matrix

joint probability matrix π0=0.8,π1=0.2\pi_0 = 0.8, \pi_1 = 0.2

equivalent to

τ=π0π1\tau = \frac{\pi_0}{\pi_1}

‘uniform’ means π0=π1\pi_0 = \pi_1

average error probability pep_e

pe=π0pfalse alarm +π1pmiss p_{e}=\pi_{0} p_{\text {false alarm }}+\pi_{1} p_{\text {miss }}

minimize pep_e     \iff maximize:

π0P(declareH0H0)+π1P(declareH1H1)=kπ0P(X=kH0)P(declareH0X=k)+π1P(X=kH1)P(declareH1X=k)\begin{aligned}& \pi_0 P(\text{declare} H_0|H_0) + \pi_1 P(\text{declare} H_1|H_1) \\=&\sum_k \pi_0 P(X=k|H_0)P(\text{declare} H_0|X=k) + \\&\pi_1 P(X=k|H_1)P(\text{declare} H_1|X=k) \\\end{aligned}

, therefore,

ideclare=arg maxi  πiP(X=kHi)i_{\text{declare} } = \argmax_i \;\pi_i P(X=k|H_i)

So MAP decision rule is the one that minimizes pep_e . The MAP rule is also called the minimum
probability of error rule
.


cdf

what to do if Ω\Omega is uncountable?

the usual event algebra in this case if the Borel algebra.

it is the smallest algebra closed under formation of countably many unions and intersections containing all finite intervals in R\mathbb{R}

ϵ=(R,B,P)\epsilon = (\mathbb{R}, \mathcal{B}, P)

FX(C)=Pr(XC)F_X(C) = \operatorname{Pr}(X\le C)

$F_X determinesprobabilitymeasureonsemiinfiniteintervalsdetermines probability measure on semi-infinite intervals \rightarrow$ suffices determines all intervals.

Size of jump points of cdf:

ΔFX(x)=FX(c)FX(c)\Delta F_X(x) = F_X(c) -F_X(c-)

P(X(a,b])=FX(b)FX(a),a<bP(X \in (a,b]) = F_X(b) - F_X(a), \quad a < b

P(X<c)=FX(c)P(X < c) = F_X(c-)

P(X=c)=ΔFX(c)P(X = c) = \Delta F_X(c)


pdf

1=+fX(u)du1 = \int_{-\infty}^{+\infty}f_X(u)du

E[X]=ufX(u)duE[X] = \int_{-\infty }^{\infty }u f_X(u)du

E[X2]=u2fX(u)duE[X^2] = \int_{-\infty }^{\infty }u^2 f_X(u)du

Var(X)=E[X2]E[X]\operatorname{Var}(X) = E[X^2] - E[X]


Untitled

if continuous at cc

FX(c)=fX(c)F_X'(c) = f_X(c)

The Exponential Distribution

cdf

FX(t)={1eλtt00t<0F_{X}(t)=\left\{\begin{array}{ll}1-e^{-\lambda t} & t \geq 0 \\0 & t<0\end{array}\right.

pdf

fX(t)={λeλtt00t<0f_{X}(t)=\left\{\begin{array}{l}\lambda e^{-\lambda t} \quad t \geq 0 \\0 \quad t<0\end{array}\right.

E[X]=1λE[Xn]=n!λnE[X] = \frac{1}{\lambda} \\ E[X^n] = \frac{n!}{\lambda^n}

Var(X)=1λ2Var(X) = \frac{1}{\lambda^2}

The Poisson process

泊松过程 - 维基百科,自由的百科全书

Different distribution in one graph

Different distribution in one graph

NtN_t are counting variables

P{Nt=k}=[λt]kk!eλtP\left\{N_{t}=k\right\}=\frac{[\lambda t]^{k}}{k !} e^{-\lambda t}

UiU_i are delay variables

have Exponential Distributions

fUi(t)={λeλtt00t<0f_{U_{i}}(t)=\left\{\begin{array}{l}\lambda e^{-\lambda t} \quad t \geq 0 \\0 \quad t<0\end{array}\right.

TkT_k are time variables

Tr=i=1rUiT_r = \sum_{i=1}^r U_i

have Erlang Distributions


The Erlang Distribution

In a Poisson process with arrival rate λ\lambda , the time that kk -th events happens is Erlang Distribution.

f(x;k,λ)=λkxk1eλx(k1)!for x,λ0f(x; k,\lambda)={\lambda^k x^{k-1} e^{-\lambda x} \over (k-1)!}\quad{\text{for }}x, \lambda \geq 0

Linear Scaling of pdfs

Y=aX+bY = aX+b

is a linear transformation of XX

cdf:

FY(y)=FX(yba)F_Y(y) = F_X(\frac{y-b}{a} )

pdf

fY(y)=dFY(y)dy=1afX(yba)f_Y(y) = \frac{dF_Y(y)}{dy} = \frac{1}{a}f_X(\frac{y-b}{a} )


The Uniform Distribution

XU[a,b]Var(X)=112(ba)2E[X]=a+b2X \sim \mathcal{U}_{[a,b]} \\ Var(X) = \frac{1}{12}(b-a)^2 \\ E[X] = \frac{a+b}{2}

The Gaussian (Normal) Distribution

N(μ,σ2)N(\mu,\sigma^2):

f(x)=12πσ2exp((xμ)22σ2)f(x)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right)

Untitled

pdf

f(x)=12πexp(x22)f(x) = \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)

cdf

Φ(x)=12πxexp(u22)du\Phi(x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{x}\exp\left(-\frac{u^2}{2}\right) du

Q(x)=P(S>x)=1Φ(x)Q(x) = P(S>x)=1-\Phi(x)

$N(\mu,\sigma^2) canbeobtainedfromthestandarddistributioncan be obtained from the standard distribution Y=\sigma X+\mu$.

E[Y]=E[σX+μ]=μE[Y] = E[\sigma X+\mu] = \mu

Var(Y)=Var(σX+μ)=σ2Var(Y) = Var(\sigma X+\mu) = \sigma^2

The Central Limit Theorem

Let Sn,pBin(n,p)S_{n,p} \sim \operatorname{Bin}(n,p) , and standardized version:

S^n,p:=Sn,pμσ\hat S_{n,p} := \frac{S_{n,p} - \mu}{\sigma}

then

limnP{S^n,pc}=Φ(c)\lim _{n \rightarrow \infty} P\left\{\hat{S}_{n, p} \leq c\right\}=\Phi(c)

The Central Limit Thm Continuity Correction

Untitled

when bounded from above

P{Sn,pL}P{Sn,p12L}=P{Sn,pμ12σLμσ}=P{S^n,pLμ+12σ}Φ(Lμ+12σ).\begin{aligned} P\left\{S_{n, p} \leq L\right\} \rightarrow &P\left\{S_{n, p}-\frac{1}{2} \leq L\right\} \\ =&P\left\{\frac{S_{n, p}-\mu-\frac{1}{2}}{\sigma} \leq \frac{L-\mu}{\sigma}\right\} \\ =&P\left\{\hat{S}_{n, p} \leq \frac{L-\mu+\frac{1}{2}}{\sigma}\right\} \\ \approx &\Phi\left(\frac{L-\mu+\frac{1}{2}}{\sigma}\right) . \end{aligned}

when bounded from below

P{Sn,pL}P{Sn,p+12L}P\left\{S_{n, p} \geq L\right\} \rightarrow P\left\{S_{n, p}+\frac{1}{2} \geq L\right\}


Generating a r.v. with a specified distribution

$U isr.v.uniformlydistributedonis r.v. **uniformly** distributed on [0,1] ,find, find g suchthatsuch that g(U) hasadistributionhas a distribution F$.

Let g(u)=F1(u)g(u) = F^{-1}(u) ,

then let X=g(U)X=g(U) , then

FX(g(u))=FU(u)=uF_X(g(u))=F_U(u)=u

suffices.

Failure Rate Functions

For r.v. TT describing a lifetime of a system, the failure rate function is:

h(t)=limϵ0P{t<T<t+ϵT>t}ϵh(t) = \lim_{\epsilon \to 0}\frac{P\{t<T<t + \epsilon |T>t\}}{\epsilon }

then, given that the system is not failed up to time tt , the probability that is will fail within the next ϵ\epsilon -time interval is

T(t,t+ϵ) h(t)ϵT \in (t,t+\epsilon ) \sim  h(t)\epsilon

the h(t)h(t) can be expressed as

h(t):=limϵ0P{t<T<t+ϵT>t}ϵ=limϵ0P{t<T<t+ϵ}ϵP{T>t}=limϵ0FT(t+ϵ)FT(t)ϵ(1FT(t))=FT(t)1FT(t)=fT(t)1FT(t).\begin{aligned}h(t):=\lim _{\epsilon \rightarrow 0} \frac{P\{t<T<t+\epsilon \mid T>t\}}{\epsilon} & =\lim _{\epsilon \rightarrow 0} \frac{P\{t<T<t+\epsilon\}}{\epsilon P\{T>t\}} \\& =\lim _{\epsilon \rightarrow 0} \frac{F_{T}(t+\epsilon)-F_{T}(t)}{\epsilon\left(1-F_{T}(t)\right)} \\& =\frac{F_{T}^{\prime}(t)}{1-F_{T}(t)}=\frac{f_{T}(t)}{1-F_{T}(t)} .\end{aligned}

notice that

h(t)=[ln(1FT(t))]0th(t)dt=ln(1FT(t))ln(1FT(0))=ln(1FT(t))therefore, FT(t)=1exp(0th(t)dt).-h(t)=[\ln(1-F_T(t))]' \\ -\int_0^th(t)dt=\ln(1-F_T(t))-\ln(1-F_T(0))=\ln(1-F_T(t)) \\ \text{therefore, } F_T(t) = 1-\exp\left(-\int_0^t h(t) dt\right).

Hypothesis Testing

see also: Binary hypothesis testing with discrete-type observations

A random variable XX has pdf f1f_1 (hypothesis H1H_1 ) or f0f_0 (hypothesis H0H_0 ).

For observation uu , form the likelihood ratio of pdfs:

Λ(u)=f1(u)f0(u)\Lambda(u) = \frac{f_1(u)}{f_0(u)}

ML rule

{Λ(u)>1declare H1Λ(u)<1declare H0\begin{cases}\Lambda(u) > 1 & \text{declare } H_1 \\\Lambda(u) < 1 & \text{declare } H_0\\\end{cases}

MAP rule

with prior probabilities π0\pi_0 and π1\pi_1 ,

{Λ(u)>π0π1 declare H1Λ(u)<π0π1declare H0\begin{cases}\Lambda(u) > \frac{\pi_0}{\pi_1}  & \text{declare } H_1 \\\Lambda(u) < \frac{\pi_0}{\pi_1} & \text{declare } H_0\\\end{cases}

error

 pe=P( Declare H1H0)π0+P( Declare H1H1)π1 p_{e}=P\left(\text { Declare } H_{1} \mid H_{0}\right) \pi_{0}+P\left(\text { Declare } H_{1} \mid H_{1}\right) \pi_{1}

Jointly Distributed Random Variables

joint CDF

FXY(u,v)=P{Xu,Yv}.F_{XY}(u, v) = P\{X\le u, Y\le v\}.

P{(X,Y)(a,b]×(c,d]}=FX,Y(b,d)FX,Y(a,d)FX,Y(b,c)+FX,Y(a,c).P\{(X, Y) \in (a,b]\times (c,d]\}=F_{X, Y}(b, d)-F_{X, Y}(a, d)-F_{X, Y}(b, c)+F_{X, Y}(a, c) .

Is a functon valid joint CDF?

To be a valid joint CDF for two random variables UU and VV , a function must satisfy several properties:

  1. Non-decreasing: F(u,v)F(u, v) must be non-decreasing in both uu and vv . That is, if u1u2u_1 \leq u_2 and v1v2v_1 \leq v_2 , then F(u1,v1)F(u2,v2)F(u_1, v_1) \leq F(u_2, v_2) .

  2. Right-continuous: F(u,v)F(u, v) must be right-continuous in both uu and vv .

  3. Limits at infinity:

    • limuF(u,v)=0\lim_{u \to -\infty} F(u, v) = 0 for all vv
    • limvF(u,v)=0\lim_{v \to -\infty} F(u, v) = 0 for all uu
    • limuF(u,v)=F(,v)\lim_{u \to \infty} F(u, v) = F(\infty, v) for all vv
    • limvF(u,v)=F(u,)\lim_{v \to \infty} F(u, v) = F(u, \infty) for all uu
    • limu,vF(u,v)=1\lim_{u \to \infty, v \to \infty} F(u, v) = 1
  4. Non-negativity: F(u,v)0F(u, v) \geq 0 for all u,vu, v .

  5. Joint Probability: For any u1<u2u_1 < u_2 and v1<v2v_1 < v_2 , the joint probability should be non-negative:
    F(u2,v2)F(u2,v1)F(u1,v2)+F(u1,v1)0F(u_2, v_2) - F(u_2, v_1) - F(u_1, v_2) + F(u_1, v_1) \geq 0

Joint PMF

pXY(x,y)={X=x,Y=y}p_{XY}(x,y) = \{X=x, Y=y\}

Marginal pdf

fX(x)=fXY(x,y)dyf_X(x)=\int_{-\infty}^{\infty}f_{XY}(x,y)dy

The Marginal cdfs

FX(u)=limnFX,Y(u,n) and FY(v)=limnFX,Y(n,v)F_{X}(u)=\lim _{n \rightarrow \infty} F_{X, Y}(u, n) \quad \text { and } \quad F_{Y}(v)=\lim _{n \rightarrow \infty} F_{X, Y}(n, v)

The Conditional pdf

fYX(vu)={fXY(u,v)fX(u)iffX(u)0undefined iffX(u)=0f_{Y|X}(v|u) = \begin{cases}\frac{f_{XY}(u,v)}{f_X(u)} & \text{if} f_X(u)\ne 0 \\\text{undefined} &  \text{if} f_X(u) = 0 \\\end{cases}

E[YX=u]=yfYX(yx)dyE[Y|X=u] = \int_{-\infty}^{\infty}yf_{Y|X}(y|x)dy

Sum or r.v

given Z=X+YZ = X + Y

E[Z]=E[X]+E[Y]E[Z]=E[X]+E[Y]

Var(Z)=Var(X)+Var(Y)+2Cov(X,Y)Var(XY)=Var(X)+Var(Y)2Cov(X,Y)Var(Z) = Var(X) + Var(Y) + 2 Cov(X, Y) \\ Var(X-Y) = Var(X) + Var(Y) - 2 Cov(X, Y) \\

Cov(X,Y)=E[(XE[X])(YE[Y])]Cov(X,Y)=E[(X−E[X])(Y−E[Y])]

If X and Y are independent, then Cov(X,Y)=0Cov(X, Y) = 0 .

pZ(u)=pX(u)pY(u)pZ(u)=k=pX(k)pY(uk)=k=pX(uk)pY(k)\begin{array}{c}p_{Z}(u)=p_{X}(u) * p_{Y}(u) \\p_{Z}(u)=\sum_{k=-\infty}^{\infty} p_{X}(k) p_{Y}(u-k)=\sum_{k=-\infty}^{\infty} p_{X}(u-k) p_{Y} (k) \\\end{array}

fZ(u)=fX(u)fY(u)fZ(u)=fX(τ)fY(uτ)dτ=fX(uτ)fY(τ)dτ \begin{array}{c}f_{Z}(u)=f_{X}(u) * f_{Y}(u) \\f_{Z}(u)=\int_{-\infty}^{\infty} f_{X}(\tau) f_{Y}(u-\tau) d \tau=\int_{-\infty}^{\infty} f_{X}(u-\tau) f_{Y}(\tau) d \tau\end{array}

Linear Transformation of Joint R.V.s

(X,Y)(W,Z) where W=aX+bYZ=cX+dY(X, Y) \mapsto(W, Z) \quad \text { where } \quad \begin{array}{c} W = aX+b Y \\ Z = c X+d Y \end{array}

(WZ)=A(XY)whereZ=(abcd)\begin{pmatrix} W \\ Z \end{pmatrix} = A \begin{pmatrix} X \\ Y \end{pmatrix} \quad \text{where} \quad Z = \begin{pmatrix} a & b \\ c & d \end{pmatrix}

Proposition

fWZ(w,z)=1detAfXY(A1(wz))f_{W Z}(w, z)=\frac{1}{|\operatorname{det} A|} f_{X Y}\left(A^{-1}\left(\begin{array}{c} w \\ z \end{array}\right)\right)

Correlation and Covariance

For r.v. XX , YY

correlation

E[XY]E[XY]

covirance

Cov(X,Y):=E[(XE[X])(YE[Y])]Cov(X, Y) := E[(X-E[X])(Y-E[Y])]

Cov(X,Y)=E[XY]E[X]E[Y]Cov(X,X)=Var(X)Cov(X, Y) = E[XY]- E[X]E[Y] \\ Cov(X, X) = Var(X)

correlation coeffcient

ρXY=Cov(X,Y)Var(X)Var(Y)=Cov(X,Y)σXσY\rho_{X Y}=\frac{\operatorname{Cov}(X, Y)}{\sqrt{\operatorname{Var}(X)} \sqrt{\operatorname{Var}(Y)}}=\frac{\operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y}

if E[X]=E[Y]=0E[X] = E[Y] = 0

Cov(X,Y)=E[XY]Cov(X, Y) = E[XY]

$X ,, Y$ are uncorelated if

Cov(X,Y)=0Cov(X, Y) = 0

$X ,, Y$ are positively corelated if

Cov(X,Y)>0Cov(X, Y) > 0

and

Cov(X+Y,U+V)=Cov(X,U)+Cov(X,V)+Cov(Y,U)+Cov(Y,V)Cov(X+Y, U+V) = Cov(X, U) + Cov(X, V) + Cov(Y, U) + Cov(Y, V)

Cov(aX+b,cY+d)=acCov(X,Y)Cov(aX+b, cY+d) = acCov(X, Y)

Cauchy-Schwarz Inequality

E[XY]E[X2]E[Y2]|E[X Y]| \leq \sqrt{E\left[X^2\right] E\left[Y^2\right]}

Cov(X,Y)Var(X)Var(Y)|\operatorname{Cov}(X, Y)| \leq \sqrt{\operatorname{Var}(X) \operatorname{Var}(Y)}

Minimum Mean Square Error Estimation

flowchart LR
    A[Experiment] -->|Y| B[Noise]
    B -->|X| C[Guess]
    C -->|"g(X)=Y"| ...

An estimator is unbiased if

E[T^]=E[Y]E[\hat{T}] = E[Y]

Constant estimators

(Continued) To minimize MSE with Constant estimator δ\delta ,

E[(Yδ)2]=E[Y2]2δE[Y]+δ2=E[Y2]E[Y]2+E[Y]22δE[Y]+δ2=Var(Y)+(E[Y]δ)2\begin{aligned} E\left[(Y-\delta)^2\right] & = E\left[Y^2\right]-2 \delta E[Y]+\delta^2 \\ & = E[Y^2] - E[Y]^2 + E[Y]^2 - 2\delta E[Y] + \delta^2 \\ & = Var(Y) + (E[Y]-\delta)^2 \end{aligned}

therefore, MSE is minimized iff δ=E[Y]\delta = E[Y] and minimum MSE is Var(Y)Var(Y) .

Unconstrained estimators

We wish to estimate YY based on an observation XX .

Goal: Minimize the mean-square error (MSE)

E[(Yg(X))2]E[(Y-g(X))^2]

The resulting estimator g(X)g^*(X) is called the unconstrained optimal
estimator
of YY based on XX (because no constraints are placed on the function gg ).

For a observation X=uX=u , similar to Constant estimators,

g(u)=E[YX=u]=vfYX(vu)dvg^*(u)=E[Y \mid X=u]=\int_{-\infty}^{\infty} v f_{Y \mid X}(v \mid u) d v

and

MSE=E[(YE[YX])2]=E[Y2]E[(E[YX])2]=E[Y2]E[g(X)2]\begin{aligned} MSE & = E[(Y-E[Y|X])^2] \\ &= E[Y^2] - E[(E[Y|X])^2] \\ &= E[Y^2] - E[g^*(X)^2] \end{aligned}

Linear estimators

Because E[YX=u]=vfYX(vu)dvE[Y| X=u]=\int_{-\infty}^{\infty} v f_{Y | X}(v | u) d v may be diffcult, use a linear estimator

L(X)=aX+bL(X) = aX + b

as g(X)g(X) .

With a given aa . the optimal linear estimator is

μY+a(XμX)\mu_Y + a(X-\mu_X)

then

MSE=E[(YμYa(XμX))2]=Var(YaX)=Cov(YaX,YaX)=Var(Y)2aCov(Y,X)+a2Var(X).\begin{aligned} M S E & =E\left[\left(Y-\mu_Y-a\left(X-\mu_X\right)\right)^2\right] \\ & =\operatorname{Var}(Y-a X) \\ & =\operatorname{Cov}(Y-a X, Y-a X) \\ & =\operatorname{Var}(Y)-2 a \operatorname{Cov}(Y, X)+a^2 \operatorname{Var}(X) . \end{aligned}

To minimize MSEMSE , take 0=ddaMSE0 = \frac{d}{da}MSE , get

a=Cov(Y,X)Var(X)a^* = \frac{Cov(Y,X)}{Var(X)}

i.e.

L(X)=μY+(Cov(Y,X)Var(X))(XμX)=μY+σYρX,Y(XμXσX).\begin{aligned} L^*(X) & =\mu_Y+\left(\frac{\operatorname{Cov}(Y, X)}{\operatorname{Var}(X)}\right)\left(X-\mu_X\right) \\ & =\mu_Y+\sigma_Y \rho_{X, Y}\left(\frac{X-\mu_X}{\sigma_X}\right) . \end{aligned}

and minimum MSE is

σY2Cov(X,Y)2Var(X)=σY2(1ρX,Y2)\sigma_Y^2-\frac{\operatorname{Cov}(X, Y)^2}{\operatorname{Var}(X)}=\sigma_Y^2\left(1-\rho_{X, Y}^2\right)

Odering

E[(Yg(X))2]MSE for g(X)=E[YX]σY2(1ρX,Y2)MSE for L(X)=E^[YX]σY2MSE for δ=E[Y].\underbrace{E\left[\left(Y-g^*(X)\right)^2\right]}_{\text {MSE for } g^*(X)=E[Y \mid X]} \leq \underbrace{\sigma_Y^2\left(1-\rho_{X, Y}^2\right)}_{\text {MSE for } L^*(X)=\widehat{E}[Y \mid X]} \leq \underbrace{\sigma_Y^2}_{\text {MSE for } \delta^*=E[Y] .}

for joint Gaussian, g(X)=L(X)g^*(X)=L^*(X)

Law of Large Numbers

recall Chebyshev’s Inequality

Let X1,X2,XnX_1, X_2, \cdots X_n independent r.v.'s, each having same mean μ\mu and variance Var(X)=σ2Var(X)=\sigma^2 , consider the sample mean:

Xn=1nk=1nXk\overline{X_n} = \frac{1}{n} \sum_{k=1}^{n} X_k

Law of Large Numbers

for any s>0s>0 ,

P{Xnμs}σ2ns20P\left\{ |\overline{X_n} - \mu| \ge s \right\} \le \frac{\sigma^2}{ns^2} \to 0

Central Limit Theorem

Let

Zn=XnμσnZ_n=\frac{\overline{X_n}-\mu}{\frac{\sigma}{\sqrt{n}}}

then E[Zn]=0E[Z_n] = 0 and Var(Z)=1Var(Z) = 1 , and ZnZ_n converges to a statndard Gaussian Distribution.

limnFZn(c)=Φ(c)\lim_{n\to \infty} F_{Z_n} (c) = \Phi(c)

The Central Limit Theorem Continuity Correction

  • Replace i=1nXii=1nXi12\sum_{i=1}^n X_i \rightarrow \sum_{i=1}^n X_i-\frac{1}{2} when bounded from above.
  • Replace i=1nXii=1nXi+12\sum_{i=1}^n X_i \rightarrow \sum_{i=1}^n X_i+\frac{1}{2} when bounded from below.

P{i=1nXiL}P{i=1nXi12L}=P{ZnLnμ+12nσ}Φ(Lnμ+12nσ)P\left\{\sum_{i=1}^n X_i \leq L\right\} \to P\left\{\sum_{i=1}^n X_i-\frac{1}{2} \leq L\right\} \\=P\left\{Z_n \leq \frac{L-n \mu+\frac{1}{2}}{\sqrt{n} \sigma}\right\} \approx \Phi\left(\frac{L-n \mu+\frac{1}{2}}{\sqrt{n} \sigma}\right)

P{i=1nXiL}P{i=1nXi+12L}=P{ZnLnμ12nσ}1Φ(Lnμ12nσ)P\left\{\sum_{i=1}^n X_i \geq L\right\} \to P\left\{\sum_{i=1}^n X_i+\frac{1}{2} \geq L\right\} \\=P\left\{Z_n \geq \frac{L-n \mu-\frac{1}{2}}{\sqrt{n} \sigma}\right\} \approx 1 - \Phi\left(\frac{L-n \mu-\frac{1}{2}}{\sqrt{n} \sigma}\right)

Joint Gaussian Random Variables

sum of Joint Gaussian Random Variables are also Gaussian Random Variables.

fWZ(w,z)=fW(w)fZ(z)=12πexp(w2+z22)f_{W Z}(w, z)=f_W(w) f_Z(z)=\frac{1}{2 \pi} \exp \left(-\frac{w ^2+z^2}{2}\right)

Linear Transformation of Joint R.V.s

(W,Z)(X,Y)(XY)=A(WZ)+bwhere  A=(abcd)(W,Z) \mapsto (X, Y) \\ \begin{pmatrix} X \\ Y \end{pmatrix} = A \begin{pmatrix} W \\ Z \end{pmatrix}+\bold{b} \quad \text{where}\; A= \begin{pmatrix} a & b\\ c & d \end{pmatrix}

(WZ)=A1(XY)where  A1=1detA(dbca)\begin{pmatrix} W \\ Z \end{pmatrix} = A^{-1} \begin{pmatrix} X \\ Y \end{pmatrix} \quad \text{where}\; A^{-1}= \frac{1}{\det{A}} \begin{pmatrix} d & -b\\ -c & a \end{pmatrix}

Proposition: If AA is invertible, then

fXY(x,y)=1detAfWZ(A1(xy)).f_{X Y}(x, y)=\frac{1}{|\det A|} f_{W Z}\left(A^{-1}\left(\begin{array}{l} x \\ y \end{array}\right)\right) .

Let σX>0,σY>0,μX,μY,1<ρ<1\sigma_X > 0,\, \sigma_Y > 0,\, \mu_X,\, \mu_Y,\, -1<\rho<1 , then linear transformation

A=(σX(1+ρ)/2σX(1ρ)/2σY(1+ρ)/2σY(1ρ)/2)b=(μXμY)A=\left(\begin{array}{cc} \sigma_X \sqrt{(1+\rho) / 2} & -\sigma_X \sqrt{(1-\rho) / 2} \\ \sigma_Y \sqrt{(1+\rho) / 2} & \sigma_Y \sqrt{(1-\rho) / 2} \end{array}\right) \quad b=\left(\begin{array}{c} \mu_X \\ \mu_Y \end{array}\right)

then (WZ)(XY)=A(WZ)+b \left(\begin{array}{c} W \\ Z \end{array}\right) \rightarrow\left(\begin{array}{l} X \\ Y \end{array}\right)=A\left(\begin{array}{c} W \\ Z \end{array}\right)+b gives a bivariate Gaussian pdf:

fXY(x,y)=12πσXσY1ρ2exp(xμXσX)2+(yμYσY)22ρ(xμXσX)(yμYσY)2(1ρ2)f_{X Y}(x, y)=\frac{1}{2 \pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \exp{-\frac{\left(\frac{x-\mu_X}{\sigma_X}\right)^2+\left(\frac{y-\mu_Y}{\sigma_Y}\right)^2-2 \rho\left(\frac{x-\mu_X}{\sigma_X}\right)\left(\frac{y-\mu_Y}{\sigma_Y}\right)}{2\left(1-\rho^2\right)}}

Or, with normalized x~=xμxσx,  y~=yμyσy\tilde{x} = \frac{x-\mu_x}{\sigma_x},\; \tilde{y} = \frac{y-\mu_y}{\sigma_y} :

fXY(x,y)=12πσXσY1ρ2exp(y~ρx~)22(1ρ2)expx22f_{X Y}(x, y)=\frac{1}{2 \pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \exp{-\frac{(\tilde{y}-\rho \tilde{x})^2}{2\left(1-\rho^2\right)}} \exp -\frac{x^2}{2}

Properties

fX(x)=+dyfXY(x,y)=12πσx2exp((xμx)22σx2)f_X(x) = \int_{-\infty}^{+\infty} dyf_{XY}(x, y) \\=\frac{1}{\sqrt{2 \pi \sigma_x^{2}}} \exp \left(-\frac{(x-\mu_x)^{2}}{2 \sigma_x^{2}}\right)

fYX(yx)=12πσY1ρ2exp12(yμYρσY(xμX)/σX)2σY2(1ρ2)=N(μY+ρσY(xμXσX),σY2(1ρ2))=N(E[YX=x],Var(YX=x))f_{Y \mid X}(y \mid x)=\frac{1}{\sqrt{2 \pi} \sigma_Y \sqrt{1-\rho^2}} \exp{-\frac{1}{2} \frac{\left(y-\mu_Y-\rho \sigma_Y\left(x-\mu_X\right) / \sigma_X\right)^2}{\sigma_Y^2\left(1-\rho^2\right)}} \\ =N\left(\mu_Y+\rho \sigma_Y\left(\frac{x-\mu_X}{\sigma_X}\right), \sigma_Y^2\left(1-\rho^2\right)\right) \\ = N\left(E[Y|X=x], \sqrt{Var(Y|X=x)}\right)

Or, normalized:

fYX(yx)=12π1ρ2e12(yρx)21ρ2=N(ρx,1ρ2)f_{Y \mid X}(y \mid x)=\frac{1}{\sqrt{2 \pi} \sqrt{1-\rho^2}} e^{-\frac{1}{2} \frac{(y-\rho x)^2}{1-\rho^2}} \\ = N(\rho x, 1 - \rho^2)

Unconstrained estimator gg

g(x)=L(x)=E[YX=x]=μY+ρσY(xμXσX)g^*(x) = L^*(x)=E[Y|X=x]=\mu_Y + \rho\sigma_Y\left(\frac{x-\mu_X}{\sigma_X}\right)


ECE313 Course Notes
https://yzzzf.xyz/2023/11/04/ECE313/
Author
Zifan Ying
Posted on
November 4, 2023
Licensed under