Distributed Systems (1-2)
https://courses.grainger.illinois.edu/ece428/sp2024//assets/slides/lect2-after.pdf
Key aspects of a distributed system
- Processes must communicate with one another to
coordinate actions. Communication time is variable. - Different processes (on different computers) have different clocks.
- Processes and communication channels may fail.
Relationship between processes
- Client-server
- Peer-to-peer
Two ways to model
Synchronous distributed systems:
- Known upper and lower bounds on time taken by each step in a
process. - Known bounds on message passing delays.
- Known bounds on clock drift rates.
Asynchronous distributed systems:
- No bounds on process execution speeds.
- No bounds on message passing delays.
- No bounds on clock drift rates.
Types of failure
- Crash
- Fail-stop
- Communication omission
Fail-stop failure is a type of failures that cause the component of a system experiencing this type of failure stops operating.
{: .prompt-tip }
Detect a crashed process
flowchart LR
p -->|Periodic ping| q
q -->|ack| p
p sends pings to q every seconds. is the timeout value at p. If time elapsed after sending ping, and no ack
,
report q crashed.
- If synchronous, = 2(max network delay)
- If asynchronous, = (max observed round trip time)
flowchart RL
q --->|heartbeat| p
q sends heartbeats to p every seconds. is the timeout value at p. If time elapsed since last heartbeat, report q crashed.
- If synchronous, = max network delay – min network delay
- If asynchronous, = k(observed delay)
Correctness of failure detection
- Completeness: Every failed process is eventually detected.
- Accuracy: Every detected failure corresponds to a crashed process (no mistakes).
Impossible to achieve both completeness and accuracy.
Metrics for failure detection
Worst case failure detection time
Ping-ack:
where is time taken for last ping from p to reach q
Heartbeat:
where is time taken for last message from q to reach p
Bandwidth usage
- Ping-ack: messages every units
- Heartbeat: message every units
Distributed Systems (1-2)
https://yzzzf.xyz/2024/01/23/distributed-systems-1-2/