Reinforcemant Learning (12) Every-visit Monte-Carlo Suppose we Have a continuing task. What/if we cannot set the starting state arbitrarily? i.e. we have a single long trajectory with length NNN s1,a1,r1,s2,a2,r2,s3,a3,r3,…s_1, 2024-03-18 Course Notes > Reinforcement Learning
Communication Networks (11) TCP reliable data transfer TCP ACK generation Arrival of in-order segment with expected seq #. One other segment has ACK pending | Immediately send single cumulative ACK, ACKing both in-order segme 2024-03-06 Course Notes > Communication Networks
Communication Networks (12) TCP reliable data transfer TCP ACK generation Arrival of in-order segment with expected seq #. One other segment has ACK pending | Immediately send single cumulative ACK, ACKing both in-order segme 2024-03-06 Course Notes > Communication Networks
Communication Networks (10) Reliable Data Transfer: Intuition Selective Repeat receiver individually acknowledges all correctly received pkts sender only resends pkts for which ACK not received sender window Selective repeat 2024-03-05 Course Notes > Communication Networks
Communication Networks (9) Reliable Data Transfer: Intuition rdt reliable data transfer protocol udt unreliable data transfer protocol Reliable Channel channel is perfectly reliable: no bit errors no loss of packets Channe 2024-03-05 Course Notes > Communication Networks
Reinforcemant Learning (11) Model-based RL with a sampling oracle (Certainty Equivalence) Cont’d To find QM^⋆Q^\star_{\hat{M}}QM^⋆ with empirical R^\hat{R}R^ and P^\hat{P}P^ : f0∈RSA,fk∈T^fk−1.f_0 \in \mathbb{R}^{SA}, \quad f_k 2024-02-25 Course Notes > Reinforcement Learning
Reinforcemant Learning (10) The Learning Setting planning and learning Planning: given MDP model, how to compute optimal policy The MDP model is known Learning: MDP model is unknown collect data from the MDP: (s,a,r,s′)(s,a,r 2024-02-24 Course Notes > Reinforcement Learning
Distributed System (11) Consensus Each process proposes a value. All processes must agree on one of the proposed values. Required Properties Termination: Eventually each process sets its decision variable. Liveness Agr 2024-02-23 Course Notes > Distributed Systems
Distributed System (10) Leader Election Any process can call for an election. A process can call for at most one election at a time. Multiple processes are allowed to call an election simultaneously. All of them together m 2024-02-23 Course Notes > Distributed Systems
Distributed System (9) Mutual Exclusion Ricart-Agrawala’s Algorithm enter() at process Pi set state to Wanted multicast “Request” <Ti,Pi><T_i, P_i><Ti,Pi> to all processes where Ti=T_i =Ti= current La 2024-02-23 Course Notes > Distributed Systems