Software Defined Measurement for Data Centers Masoud Moshref

  • Slides: 1
Download presentation
Software Defined Measurement for Data Centers Masoud Moshref, Minlan Yu, Ramesh Govindan Motivation o

Software Defined Measurement for Data Centers Masoud Moshref, Minlan Yu, Ramesh Govindan Motivation o Management policies such as o Select the right primitive at switches • Traffic engineering • Counters in Open. Flow rules • Accounting • Sketches in hash-based switches • Troubleshooting • Sampling (Net. Flow, s. Flow) o Need measurements in • Programmable switches • Different time-scales o Based on • Multiple granularities of flows • Traffic properties (stability) • Single/Multiple switches • Measurement task properties o Can we encapsulate the § Time-scale measurement tasks in a § Local/Global view controller module? o Use resources efficiently based on • Resource/Accuracy tradeoff Controller Software Defined Measurement TE Accounting SDM Configure resources Fetch statistics Hierarchical Heavy Hitters o Definition: • The longest IP prefixes • That contribute a large amount of traffic (>threshold) • After excluding any HHH descendants in prefix tree o For traffic engineering, accounting, anomaly detection Flow-based Switches o For slowly-varying traffic and large time-scale o At switches: Uses TCAM entries o At controller: pick which prefixes to monitor, given a limit on the number of counters Max-Cover algorithm o Split the prefix with maximum traffic o Merge siblings with total minimum traffic o Stop if no sibling with traffic < maximum prefix Programmable Switches Hierarchical heavy hitter Hash-based Switches o For variable traffic and large time-scale o At switches: sketches: H 1 • Multiple hash functions • SRAM counters o Hierarchical Count-Min sketch o At controller: restricted resource allocation packet H 2 H 3 H 4 w d Discussion o Because of large control traffic at small time-scales o Multiple switches o Give more responsibilities to the switches • Distribute labor on the path of flows • The right division of labor between the controller • Compose measured data at the controller and switches? o Multiple tasks o Find heavy hitters for each IP prefix length at • Distribute resources among tasks switches using • Use joint information to save resources • Sketches (Count-Min sketch) § Multiple time-scales • Counting algorithms (Space Saving) o New primitives for programmable switches o How to do a global task? • Heap for Space Saving counting algorithm Resource/Accuracy Tradeoff for Flow-based vs Hash-based Switches o Flow-based: Max-Cover o Hash-based: Count-Min sketch o Equal switch resource cost TCAM 80*SRAM o 80 x less bandwidth for flow-based o 2 x accuracy for sketch-based for small threshold