The Synchronous Data Center Tian Yang Robert Gifford

The Synchronous Data Center Tian Yang Robert Gifford Andreas Haeberlen Linh Thi Xuan Phan Department of Computer and Information Science University of Pennsylvania A. Haeberlen Hot. OS XVII (May 14, 2019) 1

If trains were asynchronous… A. Haeberlen Station clocks would be at most loosely synchronized Congestion would appear at unpredictable times Trains would have arbitrary delays and would often be lost entirely Station stops could take an arbitrary amount of time Hot. OS XVII (May 14, 2019) 2

The Asynchrony assumption n System designers typically assume that: n n n Clocks are at most loosely synchronized Network latencies are unpredictable Packets are often dropped in the network We don’t know much about node speeds This is often a good idea! n Sometimes we really don’t know n n Nicely conservative assumption n n A. Haeberlen Example: System with multiple administrative domains If the system works in this model, it almost certainly will work on the actual hardware This is the “default”; rarely questioned Hot. OS XVII (May 14, 2019) 3

Asynchrony can be expensive n n But: No time bounds can be given on anything This makes many things very difficult! n n A. Haeberlen Example: Congestion control Fault detection Consistency Fighting tail latencies Hot. OS XVII (May 14, 2019) 4

It doesn’t have to be that way! n The train network is not asynchronous n n Not all distributed systems are, either! Example: Cyber-physical systems n n n A. Haeberlen Single administrative domain (like a data center!) Carefully scheduled; speeds and timings are (mostly) known Clocks are closely in sync Network traffic is scheduled; hard latency bounds are known No congestion losses! (And transmission losses are rare) Node speeds and execution times are known exactly CPS are mostly synchronous (out of necessity)! Hot. OS XVII (May 14, 2019) 5

So what? n Synchrony helps in two ways: n n n Hard latency bounds -> we know how long we need to wait! Absence of a message at a particular time means something How does that help us? n n n No (surprising) congestion anymore Fault detection would be much easier Consistency would be easier to get Long latency tails would disappear Many algorithms become simpler, or even trivial (“boring”) Workloads with timing requirements can be supported n A. Haeberlen Example: Inflate airbag when sensors detect a collision Hot. OS XVII (May 14, 2019) 6

Could a data center be synchronous? n At first glance, absolutely not! n Some objections: n n Network is shared, so packet delays are unpredictable! Who knows how long anything takes under Linux? Clocks can’t be synchronized closely enough! But: Fastpass (SIGCOMM’ 14) Real-time operating systems Spanner (OSDI’ 12) Our claim: Most of the asynchrony in today’s data centers is avoidable! A. Haeberlen Hot. OS XVII (May 14, 2019) 7

Outline n Goal: Synchronous data center n How could it be done? n n n A. Haeberlen NEXT Network layer Synchronized clocks Building blocks Hardware Software Scheduling Hot. OS XVII (May 14, 2019) 8

The How: Network layer n Why is latency so unpredictable? n n Cross-traffic and queueing! Inspiration: Fastpass (SIGCOMM’ 14) n n Machines must ask an ‘arbiter’ for permission before sending Arbiter schedules packets (at >2 Tbit/s on eight cores!) Result: (almost) no queueing in the network! No attempt to control end-to-end timing n A. Haeberlen But we see no reason why this couldn’t be added! Hot. OS XVII (May 14, 2019) 9

The How: Synchronized clocks n Why are clocks so hard to synchronize? n n Hard to do in the wide area, or via NTP (with cross-traffic) But it can be done: n DTP (SIGCOMM’ 16) achieves nanosecond-precision n n A. Haeberlen Figure 6(a) from the DTP paper Google Spanner (OSDI’ 12) keeps different data centers to within ~4 ms n n … with some help from the hardware … with some help from atomic clocks Having predictable network latencies should help, too! Hot. OS XVII (May 14, 2019) Figure 6 from the Spanner paper 10

The How: Building blocks Async Sync Tmax Ordering A. Haeberlen Fault detection Hot. OS XVII (May 14, 2019) 11

The How: Software n Why is software timing so unpredictable? n Reason #1: Hardware features (caches, etc. ) n n Not as bad as it seems: +/- 2% is possible (TDR, OSDI’ 14) Emerging features, such as Intel’s CAT, should help Meltdown/Spectre will probably accelerate this trend Reason #2: OS structure n n n Linux & friends are not designed for timing stability Idea from CPS: Use elements from RT-OSes … but it will require deep structural changes! n A. Haeberlen No small “synchrony patch” for Linux! Hot. OS XVII (May 14, 2019) 12

The How: Fault tolerance n What if things (inevitably) break? n n Challenge #1: Telling when it breaks n n Could disrupt the careful synchronous “choreography”! Actually easier with synchrony! Challenge #2: Doing something about it n n How to reconfigure while maintaining timing guarantees? Idea from CPS: Use mode-change protocols! n n n A. Haeberlen System can operate in different “modes”, based on observed faults Transition from one mode to another via precomputed protocols Result: Timing is maintained during the transition Hot. OS XVII (May 14, 2019) 13

The How: Scheduling n Can you schedule an entire data center? n Surprisingly, we are getting pretty good at it! n n Idea from CPS: Compositional scheduling n n A. Haeberlen Sparrow (SOSP’ 13) can schedule 100 ms tasks on 10, 000 s of cores Schedule smaller entities (nodes? pods? ) in detail Abstract and aggregate, then schedule next-larger entity Repeat until entire system is scheduled Dispatching can be done locally; so can most updates Hot. OS XVII (May 14, 2019) 14

Summary n Synchronous data centers seem possible! n n There are interesting benefits to be had! n n n Reasons to be optimistic: Fastpass, DTP, RTOSes, … Asynchrony creates or amplifies challenges like fault detection, congestion control, consistency, tail latencies, load balancing, performance debugging, algorithmic complexity, … These problems could become simpler, or go away entirely! But much work remains to be done! n n Not much existing work on DC-scale synchronous systems Can we adapt some ideas from cyber-physical systems? Questions? A. Haeberlen Hot. OS XVII (May 14, 2019) 15