Distributed Systems (1-2)

https://courses.grainger.illinois.edu/ece428/sp2024//assets/slides/lect2-after.pdf

Key aspects of a distributed system

  • Processes must communicate with one another to
    coordinate actions. Communication time is variable.
  • Different processes (on different computers) have different clocks.
  • Processes and communication channels may fail.

Relationship between processes

  • Client-server
  • Peer-to-peer

Two ways to model

Synchronous distributed systems:

  • Known upper and lower bounds on time taken by each step in a
    process.
  • Known bounds on message passing delays.
  • Known bounds on clock drift rates.

Asynchronous distributed systems:

  • No bounds on process execution speeds.
  • No bounds on message passing delays.
  • No bounds on clock drift rates.

Types of failure

  • Crash
  • Fail-stop
  • Communication omission

Fail-stop failure is a type of failures that cause the component of a system experiencing this type of failure stops operating.
{: .prompt-tip }

Detect a crashed process

flowchart LR
    p -->|Periodic ping| q
    q -->|ack| p

p sends pings to q every TT seconds. Δ1\Delta_1 is the timeout value at p. If Δ1\Delta_1 time elapsed after sending ping, and no ack,
report q crashed.

  • If synchronous, Δ1\Delta_1 = 2(max network delay)
  • If asynchronous, Δ1\Delta_1 = (max observed round trip time)
flowchart RL
    q --->|heartbeat| p

q sends heartbeats to p every TT seconds. (T+Δ2)(T + \Delta_2) is the timeout value at p. If (T+Δ2)(T + \Delta_2) time elapsed since last heartbeat, report q crashed.

  • If synchronous, Δ2\Delta_2 = max network delay – min network delay
  • If asynchronous, Δ2\Delta_2 = k(observed delay)

Correctness of failure detection

  • Completeness: Every failed process is eventually detected.
  • Accuracy: Every detected failure corresponds to a crashed process (no mistakes).

Impossible to achieve both completeness and accuracy.

Metrics for failure detection

Worst case failure detection time

Ping-ack:

T+Δ1ΔT + \Delta_1- \Delta where Δ\Delta is time taken for last ping from p to reach q

Heartbeat:

Δ+T+Δ2\Delta + T + \Delta_2 where Δ\Delta is time taken for last message from q to reach p

Bandwidth usage

  • Ping-ack: 22 messages every TT units
  • Heartbeat: 11 message every TT units

Distributed Systems (1-2)
https://yzzzf.xyz/2024/01/23/distributed-systems-1-2/
Author
Zifan Ying
Posted on
January 23, 2024
Licensed under