Data Center Networks Jennifer Rexford COS 461 Computer

  • Slides: 37
Download presentation
Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10 -10: 50

Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10 -10: 50 am in Architecture N 101 http: //www. cs. princeton. edu/courses/archive/spr 12/cos 461/

Networking Case Studies Data Center Enterprise Backbone Cellular Wireless 2

Networking Case Studies Data Center Enterprise Backbone Cellular Wireless 2

Cloud Computing 3

Cloud Computing 3

Cloud Computing • Elastic resources – Expand contract resources – Pay-per-use – Infrastructure on

Cloud Computing • Elastic resources – Expand contract resources – Pay-per-use – Infrastructure on demand • Multi-tenancy – Multiple independent users – Security and resource isolation – Amortize the cost of the (shared) infrastructure • Flexible service management 4

Cloud Service Models • Software as a Service – Provider licenses applications to users

Cloud Service Models • Software as a Service – Provider licenses applications to users as a service – E. g. , customer relationship management, e-mail, . . – Avoid costs of installation, maintenance, patches, … • Platform as a Service – Provider offers platform for building applications – E. g. , Google’s App-Engine – Avoid worrying about scalability of platform 5

Cloud Service Models • Infrastructure as a Service – Provider offers raw computing, storage,

Cloud Service Models • Infrastructure as a Service – Provider offers raw computing, storage, and network – E. g. , Amazon’s Elastic Computing Cloud (EC 2) – Avoid buying servers and estimating resource needs 6

Enabling Technology: Virtualization • Multiple virtual machines on one physical machine • Applications run

Enabling Technology: Virtualization • Multiple virtual machines on one physical machine • Applications run unmodified as on real machine • VM can migrate from one computer to another 7

Multi-Tier Applications • Applications consist of tasks – Many separate components – Running on

Multi-Tier Applications • Applications consist of tasks – Many separate components – Running on different machines • Commodity computers – Many general-purpose computers – Not one big mainframe – Easier scaling

Multi-Tier Applications Front end Server Aggregator … … Aggregator … Worker 9 Worker …

Multi-Tier Applications Front end Server Aggregator … … Aggregator … Worker 9 Worker … Worker

Data Center Network 10

Data Center Network 10

Virtual Switch in Server 11

Virtual Switch in Server 11

Top-of-Rack Architecture • Rack of servers – Commodity servers – And top-of-rack switch •

Top-of-Rack Architecture • Rack of servers – Commodity servers – And top-of-rack switch • Modular design – Preconfigured racks – Power, network, and storage cabling 12

Aggregate to the Next Level 13

Aggregate to the Next Level 13

Modularity, Modularity • Containers • Many containers 14

Modularity, Modularity • Containers • Many containers 14

Data Center Network Topology Internet CR CR . . . AR AR S S

Data Center Network Topology Internet CR CR . . . AR AR S S S A A … A ~ 1, 000 servers/pod 15 AR AR . . . • • Key CR = Core Router AR = Access Router S = Ethernet Switch A = Rack of app. servers

Capacity Mismatch CR AR AR S S CR ~ 200: 1 AR AR S

Capacity Mismatch CR AR AR S S CR ~ 200: 1 AR AR S S S A A … A ~ 40: 1 S A 16 ~ S 5: 1 A … A S S A A … A . . .

Data-Center Routing Internet CR DC-Layer 3 CR . . . AR AR SS SS

Data-Center Routing Internet CR DC-Layer 3 CR . . . AR AR SS SS SS A A … A DC-Layer 2 ~ 1, 000 servers/pod == IP subnet 17 AR AR . . . • • Key CR = Core Router (L 3) AR = Access Router (L 3) S = Ethernet Switch (L 2) A = Rack of app. servers

Reminder: Layer 2 vs. Layer 3 • Ethernet switching (layer 2) – Cheaper switch

Reminder: Layer 2 vs. Layer 3 • Ethernet switching (layer 2) – Cheaper switch equipment – Fixed addresses and auto-configuration – Seamless mobility, migration, and failover • IP routing (layer 3) – Scalability through hierarchical addressing – Efficiency through shortest-path routing – Multipath routing through equal-cost multipath • So, like in enterprises… 18 – Connect layer-2 islands by IP routers

Case Study: Performance Diagnosis in Data Centers http: //www. eecs. berkeley. edu/~minl anyu/writeup/nsdi 11.

Case Study: Performance Diagnosis in Data Centers http: //www. eecs. berkeley. edu/~minl anyu/writeup/nsdi 11. pdf 19

Applications Inside Data Centers …. …. Front end Aggregator Server …. Workers 20

Applications Inside Data Centers …. …. Front end Aggregator Server …. Workers 20

Challenges of Datacenter Diagnosis • Multi-tier applications – Hundreds of application components – Tens

Challenges of Datacenter Diagnosis • Multi-tier applications – Hundreds of application components – Tens of thousands of servers • Evolving applications – Add new features, fix bugs – Change components while app is still in operation • Human factors – Developers may not understand network well – Nagle’s algorithm, delayed ACK, etc. 21

Diagnosing in Today’s Data Center App logs: #Reqs/sec Response time 1% req. >200 ms

Diagnosing in Today’s Data Center App logs: #Reqs/sec Response time 1% req. >200 ms delay Host App OS SNAP: Diagnose net-app interactions Packet trace: Filter out trace for long delay req. Packet sniffer Switch logs: #bytes/pkts per minute 22

Problems of Different Logs App logs: Application-specific Packet trace: Too expensive Host App OS

Problems of Different Logs App logs: Application-specific Packet trace: Too expensive Host App OS SNAP: Generic, fine-grained, and lightweight Runs everywhere, all the time Packet sniffer Switch logs: Too coarse-grained 23

TCP Statistics • Instantaneous snapshots – #Bytes in the send buffer – Congestion window

TCP Statistics • Instantaneous snapshots – #Bytes in the send buffer – Congestion window size, receiver window size – Snapshots based on random sampling • Cumulative counters – #Fast. Retrans, #Timeout – RTT estimation: #Sample. RTT, #Sum. RTT – Rwin. Limit. Time – Calculate difference between two polls 24

Identifying Performance Problems Sender App – Not any other problems Send Buffer – Send

Identifying Performance Problems Sender App – Not any other problems Send Buffer – Send buffer is almost full Network Receiver – #Fast retransmission – #Timeout – Rwin. Limit. Time – Delayed ACK Sampling Direct measure Inference diff(Sum. RTT)/diff(Sample. RTT) > Max. Delay 25

SNAP Architecture At each host for every connection Collect data Direct access to OS

SNAP Architecture At each host for every connection Collect data Direct access to OS - Polling per-connection statistics: • Snapshots (#bytes in send buffer) • Cumulative counters (#Fast. Restrans) - Adaptive tuning of polling rate 26

SNAP Architecture At each host for every connection Collect data Performance Classifier Classifying based

SNAP Architecture At each host for every connection Collect data Performance Classifier Classifying based on the life of data transfer - Algorithms for detecting performance problems - Based on direct measurement in the OS 27

SNAP Architecture At each host for every connection Collect data Performance Classifier Crossconnection correlation

SNAP Architecture At each host for every connection Collect data Performance Classifier Crossconnection correlation Direct access to data center configurations - Input • Topology, routing information • Mapping from connections to processes/apps - Correlate problems across connections • Sharing the same switch/link, app code 28

SNAP Deployment • Production data center – 8 K machines, 700 applications – Ran

SNAP Deployment • Production data center – 8 K machines, 700 applications – Ran SNAP for a week, collected petabytes of data • Identified 15 major performance problems – Operators: Characterize key problems in data center – Developers: Quickly pinpoint problems in app software, network stack, and their interactions 29

Characterizing Perf. Limitations Sender App #Apps that are limited for > 50% of the

Characterizing Perf. Limitations Sender App #Apps that are limited for > 50% of the time 551 Apps – Bottlenecked by CPU, disk, etc. – Slow due to app design (small writes) Send Buffer 1 App – Send buffer not large enough Network 6 Apps – Fast retransmission – Timeout Receiver 8 Apps – Not reading fast enough (CPU, disk, etc. ) 144 30 Apps – Not ACKing fast enough (Delayed ACK)

Delayed ACK • Delayed ACK caused significant problems – Delayed ACK was used to

Delayed ACK • Delayed ACK caused significant problems – Delayed ACK was used to reduce bandwidth usage and server interruption A B Data B has data K C to send Data+A …. Data Delayed ACK should be disabled in data centers B doesn’t have data to send 200 ms ACK 31

Diagnosing Delayed ACK with SNAP • Monitor at the right place – Scalable, low

Diagnosing Delayed ACK with SNAP • Monitor at the right place – Scalable, low overhead data collection at all hosts • Algorithms to identify performance problems – Identify delayed ACK with OS information • Correlate problems across connections – Identify the apps with significant delayed ACK issues • Fix the problem with operators and developers – Disable delayed ACK in data centers 32

Conclusion • Cloud computing – Major trend in IT industry – Today’s equivalent of

Conclusion • Cloud computing – Major trend in IT industry – Today’s equivalent of factories • Data center networking – Regular topologies interconnecting VMs – Mix of Ethernet and IP networking • Modular, multi-tier applications – New ways of building applications – New performance challenges 33

Load Balancing 34

Load Balancing 34

Load Balancers • Spread load over server replicas – Present a single public address

Load Balancers • Spread load over server replicas – Present a single public address (VIP) for a service – Direct each request to a server replica 10. 10. 1 Virtual IP (VIP) 192. 121. 10. 10. 10. 2 10. 10. 3 35

Wide-Area Network Servers Router DNS Server DNS-based site 36 selection Data Centers Servers Router

Wide-Area Network Servers Router DNS Server DNS-based site 36 selection Data Centers Servers Router Internet Clients

Wide-Area Network: Ingress Proxies Data Centers Servers Router Proxy Clients 37

Wide-Area Network: Ingress Proxies Data Centers Servers Router Proxy Clients 37