flayed blog

Reinforcemant Learning (12)

Every-visit Monte-Carlo Suppose we Have a continuing task. What/if we cannot set the starting state arbitrarily? i.e. we have a single long trajectory with length NNN s1,a1,r1,s2,a2,r2,s3,a3,r3,…s_1,

2024-03-18

Course Notes > Reinforcement Learning

Communication Networks (11)

TCP reliable data transfer TCP ACK generation Arrival of in-order segment with expected seq #. One other segment has ACK pending | Immediately send single cumulative ACK, ACKing both in-order segme

2024-03-06

Course Notes > Communication Networks

Communication Networks (12)

TCP reliable data transfer TCP ACK generation Arrival of in-order segment with expected seq #. One other segment has ACK pending | Immediately send single cumulative ACK, ACKing both in-order segme

2024-03-06

Course Notes > Communication Networks

Communication Networks (10)

Reliable Data Transfer: Intuition Selective Repeat receiver individually acknowledges all correctly received pkts sender only resends pkts for which ACK not received sender window Selective repeat

2024-03-05

Course Notes > Communication Networks

Communication Networks (9)

Reliable Data Transfer: Intuition rdt reliable data transfer protocol udt unreliable data transfer protocol Reliable Channel channel is perfectly reliable: no bit errors no loss of packets Channe

2024-03-05

Course Notes > Communication Networks

Reinforcemant Learning (11)

Model-based RL with a sampling oracle (Certainty Equivalence) Cont’d To find QM^⋆Q^\star_{\hat{M}}QM^⋆ with empirical R^\hat{R}R^ and P^\hat{P}P^ : f0∈RSA,fk∈T^fk−1.f_0 \in \mathbb{R}^{SA}, \quad f_k

2024-02-25

Course Notes > Reinforcement Learning

Reinforcemant Learning (10)

The Learning Setting planning and learning Planning: given MDP model, how to compute optimal policy The MDP model is known Learning: MDP model is unknown collect data from the MDP: (s,a,r,s′)(s,a,r

2024-02-24

Course Notes > Reinforcement Learning

Distributed System (11)

Consensus Each process proposes a value. All processes must agree on one of the proposed values. Required Properties Termination: Eventually each process sets its decision variable. Liveness Agr

2024-02-23

Course Notes > Distributed Systems

Distributed System (10)

Distributed System (10)

Leader Election Any process can call for an election. A process can call for at most one election at a time. Multiple processes are allowed to call an election simultaneously. All of them together m

2024-02-23

Course Notes > Distributed Systems

Distributed System (9)

Distributed System (9)

Mutual Exclusion Ricart-Agrawala’s Algorithm enter() at process Pi set state to Wanted multicast “Request” <Ti,Pi><T_i, P_i><Ti,Pi> to all processes where Ti=T_i =Ti= current La

2024-02-23

Course Notes > Distributed Systems