Z-Test Z-Test For Population Proportion: You can use the Z test to test hypotheses about population proportions. For example, you can test if the proportion of a certain category in one population differs 2024-09-18 Course Notes > Statistics
Reinforcemant Learning (19) Policy Gradient Given policy πθ\pi_\thetaπθ, optimize J(πθ):=Es∼d0[ Vπθ(s)]{J}\left(\pi_\theta\right):=\mathbb{E}_{s\sim d_0}\left[{~V}^{\pi_\theta}(s)\right]J(πθ):=Es∼d0[ Vπθ(s)] where d0d_0d0 2024-04-12 Course Notes > Reinforcement Learning
Distributed System (13) Paxos Concepts Proposal contains value and number Proposal number: An global unique “id” of the proposal Proposal value: the “content” of the proposal 3 types of roles: Proposer propose values to a 2024-04-02 Course Notes > Distributed Systems
Hexo + Github Action搭建 github仓库 首先fork https://github.com/hexojs/hexo-starter ,这个仓库是 hexo init 创建hexo站点时的文件结构模板。下文把这个仓库叫做blog仓库,是站点的源代码(markdown)仓库。 创建 <username>.github.io 仓库,这是用作github pages静态托管(html)的仓库,下文把这个仓库叫做托管 2024-04-02 网站
Reinforcemant Learning (18) Application in contextual bandit (CB) The data point is a tuple (x,a,r)(x, a, r)(x,a,r) The function of interest is (x,a,r)↦r(x, a, r) \mapsto r(x,a,r)↦r The distribution of interest is x∼d0,a∼π 2024-03-24 Course Notes > Reinforcement Learning
Reinforcemant Learning (17) A Question Es,r,s′[(Vθ(s)−r−γVθ(s′))2]\mathbb{E}_{s,r,s^{\prime}}\left[\left(V_\theta(s)-r-\gamma V_\theta\left(s^{\prime}\right)\right)^2\right]Es,r,s′[(Vθ(s)−r−γVθ(s′))2] We do Vθ(s)←Vθ(s)+α(r−γV 2024-03-23 Course Notes > Reinforcement Learning
Reinforcemant Learning (16) Q-learning Update rule: Q(st,at)←Q(st,at)+α(rt+γmaxa′Q(st+1,a′)−Q(st,at))Q\left(s_t, a_{t}\right) \leftarrow Q\left(s_{t}, a_{t}\right)+\alpha\left(r_{t}+\gamma \max _{a^{\prime}} Q\left(s_{t+1}, a^{ 2024-03-22 Course Notes > Reinforcement Learning
Reinforcemant Learning (15) Recall the Bellman Equation: (Tπf)(s,a)=R(s,a)+γEs′∼P(k,a)[f(s′,π)]=E[r+γ⋅f(s′,π)∣s,a].\begin{aligned} \left(T^\pi f\right)(s, a) & =R(s, a)+\gamma \mathbb{E}_{s^{\prime}\sim P(k, a)}\left[f\left( 2024-03-22 Course Notes > Reinforcement Learning
Reinforcemant Learning (14) Value Prediction with Function Approximation tabular representation vs. function approximation function approximation can handle infinite state space (can’t enumerate through all states). linear func 2024-03-20 Course Notes > Reinforcement Learning
Reinforcemant Learning (13) TD( λ\lambdaλ ): Unifying TD(0) and MC 1-step bootstrap (TD(0)): r1+γV(si+1)r_1 + \gamma V(s_{i+1})r1+γV(si+1) 2-step bootstrap: r1+γri+1+γ2V(si+2)r_1 + \gamma r_{i+1} + \gamma^2 V(s_{i+2})r1+γri+ 2024-03-20 Course Notes > Reinforcement Learning