flayed blog

Z-Test

Z-Test For Population Proportion: You can use the Z test to test hypotheses about population proportions. For example, you can test if the proportion of a certain category in one population differs

2024-09-18

Course Notes > Statistics

Reinforcemant Learning (19)

Policy Gradient Given policy πθ\pi_\thetaπθ, optimize J(πθ):=Es∼d0[ Vπθ(s)]{J}\left(\pi_\theta\right):=\mathbb{E}_{s\sim d_0}\left[{~V}^{\pi_\theta}(s)\right]J(πθ):=Es∼d0[ Vπθ(s)] where d0d_0d0

2024-04-12

Course Notes > Reinforcement Learning

Distributed System (13)

Paxos Concepts Proposal contains value and number Proposal number: An global unique “id” of the proposal Proposal value: the “content” of the proposal 3 types of roles: Proposer propose values to a

2024-04-02

Course Notes > Distributed Systems

Hexo + Github Action搭建

github仓库首先fork https://github.com/hexojs/hexo-starter ，这个仓库是 hexo init 创建hexo站点时的文件结构模板。下文把这个仓库叫做blog仓库，是站点的源代码（markdown）仓库。创建 <username>.github.io 仓库，这是用作github pages静态托管（html）的仓库，下文把这个仓库叫做托管

2024-04-02

网站

Reinforcemant Learning (18)

Application in contextual bandit (CB) The data point is a tuple (x,a,r)(x, a, r)(x,a,r) The function of interest is (x,a,r)↦r(x, a, r) \mapsto r(x,a,r)↦r The distribution of interest is x∼d0,a∼π

2024-03-24

Course Notes > Reinforcement Learning

Reinforcemant Learning (17)

A Question Es,r,s′[(Vθ(s)−r−γVθ(s′))2]\mathbb{E}_{s,r,s^{\prime}}\left[\left(V_\theta(s)-r-\gamma V_\theta\left(s^{\prime}\right)\right)^2\right]Es,r,s′[(Vθ(s)−r−γVθ(s′))2] We do Vθ(s)←Vθ(s)+α(r−γV

2024-03-23

Course Notes > Reinforcement Learning

Reinforcemant Learning (16)

Q-learning Update rule: Q(st,at)←Q(st,at)+α(rt+γmax⁡a′Q(st+1,a′)−Q(st,at))Q\left(s_t, a_{t}\right) \leftarrow Q\left(s_{t}, a_{t}\right)+\alpha\left(r_{t}+\gamma \max _{a^{\prime}} Q\left(s_{t+1}, a^{

2024-03-22

Course Notes > Reinforcement Learning

Reinforcemant Learning (15)

Recall the Bellman Equation: (Tπf)(s,a)=R(s,a)+γEs′∼P(k,a)[f(s′,π)]=E[r+γ⋅f(s′,π)∣s,a].\begin{aligned} \left(T^\pi f\right)(s, a) & =R(s, a)+\gamma \mathbb{E}_{s^{\prime}\sim P(k, a)}\left[f\left(

2024-03-22

Course Notes > Reinforcement Learning

Reinforcemant Learning (14)

Value Prediction with Function Approximation tabular representation vs. function approximation function approximation can handle infinite state space (can’t enumerate through all states). linear func

2024-03-20

Course Notes > Reinforcement Learning

Reinforcemant Learning (13)

TD( λ\lambdaλ ): Unifying TD(0) and MC 1-step bootstrap (TD(0)): r1+γV(si+1)r_1 + \gamma V(s_{i+1})r1+γV(si+1) 2-step bootstrap: r1+γri+1+γ2V(si+2)r_1 + \gamma r_{i+1} + \gamma^2 V(s_{i+2})r1+γri+

2024-03-20

Course Notes > Reinforcement Learning