PacketMode Emulation of OutputQueued Switches David Hay CS

  • Slides: 53
Download presentation
Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya

Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)

Outline n n n Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation

Outline n n n Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff ¨ Emulation with S 4 ¨ Emulation with S 2 n n Emulation of OQ switch w/ bounded buffer Simulation Results

CIOQ Switches

CIOQ Switches

Cell-Mode Scheduling

Cell-Mode Scheduling

Cell-Mode Scheduling

Cell-Mode Scheduling

Cell-Mode Scheduling

Cell-Mode Scheduling

Trend towards Packet-Mode n Cell-mode scheduling is getting too hard ¨ Fragmentation and reassembly

Trend towards Packet-Mode n Cell-mode scheduling is getting too hard ¨ Fragmentation and reassembly should work very fast, at the external rate ¨ Extra header for each cell loss of bandwidth n n For optical switches such fragmentation and reassembly are prohibitive Cell-mode schedulers are packet-oblivious ¨ Degradation of the overall performance

Packet-Mode Scheduling

Packet-Mode Scheduling

Packet-Mode Scheduling [Marsan et al. , 2002][Ganjali et al. , 2003][Turner, 2006] n n

Packet-Mode Scheduling [Marsan et al. , 2002][Ganjali et al. , 2003][Turner, 2006] n n No need for fragmentation and reassembly Must ensure contiguous packet delivery over the fabric ¨ While input i delivers a packet to output j, neither input i nor output j can handle other packets. Can packet-mode schedulers provide similar performance guarantees as cell-mode schedulers?

Output Queuing Emulation n OQ switches are considered optimal with respect to queuing delay

Output Queuing Emulation n OQ switches are considered optimal with respect to queuing delay and throughput ¨ But too hard to implement in practice… n Emulation: Same input traffic same output traffic n How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

Output Queuing Emulation n OQ switches are considered optimal with respect to queuing delay

Output Queuing Emulation n OQ switches are considered optimal with respect to queuing delay and throughput ¨ But too hard to implement in practice… n Emulation: Same input traffic same output traffic n How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

Cell-Mode Emulation is Possible Easy with speedup S=N è N scheduling decisions every time-slot:

Cell-Mode Emulation is Possible Easy with speedup S=N è N scheduling decisions every time-slot: n In the 1 st decision forward the cell of input 1 ¨ In the 2 nd decision forward the cell of input 2 ¨ ⋮ ¨ In the Nth decision forward the cell of input N

Cell-Mode Emulation is Possible Easy with speedup S=N è N scheduling decisions every time-slot:

Cell-Mode Emulation is Possible Easy with speedup S=N è N scheduling decisions every time-slot: n In the 1 st decision forward the cell of input 1 ¨ In the 2 nd decision forward the cell of input 2 ¨ ⋮ ¨ In the Nth decision forward the cell of input N

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n 1 st Key Concept:

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n 1 st Key Concept: Slackness of a cell (in the input side) L(C) = OC(C) - IT(C) Output Cushion: (“good guys”) n Slackness may decrease by at most. Input 2 in. Thread: every(“bad time-slot guys”) How many cells are queued in How many cells proceed C in ¨ A cell leaves the output-buffer of C’s the destination of C OC-input-port ¨ A cell at the input and is queuedits before C buffer? IT++ destination, and arrives should leave then. OQ switchslackness before C Initial can be made non-negative ¨ When C arrive, Insert it in the OC(C)th place of its input buffer. Plan: Ensure that slackness always increases by 2 è Slackness is never negative è All cells are delivered on time

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n Stable Marriage (stable matching):

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n Stable Marriage (stable matching): Given two equal-size sets M, W and preference lists from every m M, w W. Find a matching in which there are no two pairs (m, w), (m’, w’) s. t. ¨m prefer w’ over w ¨ w’ prefer m over m n Classical problem in CS ¨ Stable marriage always exists ¨ Many algorithms. .

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n Critical Cell First (CCF)

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n Critical Cell First (CCF) algorithm performs stable marriage at each decision: ¨M is the set of inputs, W is the set of outputs ¨ i prefers o 1 over o 2 if there is a cell for o 1 that is queued before all cells for o 2 ¨ o prefers i 1 over i 2 if there is a cell from i 1 that should leave before all cells from i 2

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n For each cell C

Cell-Mode Emulation w/ S=2 [Chuang et al. , 1999] n For each cell C from input-port i to output port j, and each scheduling decision: ¨C is forwarded (and we don’t care about it) ¨ C’ was forwarded from i, and i preferred to forward it IT-¨ C’ was forwarded to j, and j preferred to receive it OC++ n Two scheduling decisions every time-slots Slackness always increases by 2

Cell-Mode Emulation Easy with speedup S=N n Possible with speedup S=2 (w/ CCF) n

Cell-Mode Emulation Easy with speedup S=N n Possible with speedup S=2 (w/ CCF) n ¨ Lower bound: S≥ 2 -1/N is required [Chuang et al. , 1999] What is the speedup required for packet-mode emulation?

Outline n n n Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation

Outline n n n Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff ¨ Emulation with S 4 ¨ Emulation with S 2 n n Emulation of OQ switch w/ bounded buffer Simulation Results

Packet-Mode Emulation is Impossible n Regardless of speedup ¨ Even with speedup S=N

Packet-Mode Emulation is Impossible n Regardless of speedup ¨ Even with speedup S=N

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Outline n n n Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation

Outline n n n Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff ¨ Emulation with S 4 ¨ Emulation with S 2 n n Emulation of OQ switch w/ bounded buffer Simulation Results

Emulation w/ Relative Queuing Delay The CIOQ switch is allowed a bounded lag behind

Emulation w/ Relative Queuing Delay The CIOQ switch is allowed a bounded lag behind the shadow OQ switch F Exact same behavior as the optimal OQ switch, but with some extra delay n ¨ Called relative queuing delay Can we provide packet-mode OQ emulation with bounded RQD and small speedup?

Our Results: Speedup-RQD tradeoff Speedup 2 Lmax= maximum packet size Generalization of cell-mode Lower

Our Results: Speedup-RQD tradeoff Speedup 2 Lmax= maximum packet size Generalization of cell-mode Lower bound on RQD (known value) scheduling with S=2: (even with infinite speedup) Taking each packet of size ≤ Lmax as one huge cell 4 2 Lower bound on the speedup (from cell-mode scheduling) RQD

Intuition for Emulation Algorithms Packet Mode CIOQ Cell Mode CIOQ w/ S=2 Packet Mode

Intuition for Emulation Algorithms Packet Mode CIOQ Cell Mode CIOQ w/ S=2 Packet Mode OQ

PIFO Cell-Mode OQ Switch n FIFO = First-In First-Out

PIFO Cell-Mode OQ Switch n FIFO = First-In First-Out

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out n PIFO = Push-In First-Out n

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out n PIFO = Push-In First-Out n

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out n PIFO = Push-In First-Out n

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out n PIFO = Push-In First-Out n F FIFO Packet-Mode OQ Switch is a PIFO Cell-Mode Switch

Underlying CCF Algorithm n Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO

Underlying CCF Algorithm n Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cellmode OQ switch [Chuang et al. , 1999] G But, CCF does not maintain contiguous packet forwarding over the fabric! Packet Mode CIOQ Cell Mode CIOQ w/ S=2 PIFO Cell-Mode OQ = Packet Mode OQ

Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition

Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ

Frame-Based Schedulers Works in pipelined frame-based manner time Within each frame: n Build a

Frame-Based Schedulers Works in pipelined frame-based manner time Within each frame: n Build a demand matrix for this frame n Schedule the demand matrix of the previous frame

Building the Demand Matrix n At each frame of size T, CCF forwards at

Building the Demand Matrix n At each frame of size T, CCF forwards at most 2 T cells from each input and to each output. + + + + + + Number of cells CCF sent from input 1 to output 1 in the last frame ≤ 2 T ≤ ≤ 2 T 2 T Problem: A packet may span several frames.

Building the Demand Matrix Count only packets whose last cell is forwarded by the

Building the Demand Matrix Count only packets whose last cell is forwarded by the CCF in the frame n Each row/column in the matrix is bounded by 2 T+N(Lmax-1) n ¨ For each input-output pair only cells of one additional packet can be added. n Translates into RQD of 2 T+(Lmax-2).

Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition

Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ

Decomposing the Demand Matrix n Challenge: Decompose the matrix into permutations while maintaining contiguous

Decomposing the Demand Matrix n Challenge: Decompose the matrix into permutations while maintaining contiguous packet delivery. ¨ n Each permutation dictates a scheduling decision. First try: optimal Birkhoff von-Neumann decomposition results in 2 T+N(Lmax-1) permutations.

Contiguous Greedy Decomposition n To maintain contiguous packet delivery: ¨ If (i, j) was

Contiguous Greedy Decomposition n To maintain contiguous packet delivery: ¨ If (i, j) was matched in iteration t-1 and there are more (i, j) cells to schedule keep for iteration t. n Find a greedy matching for the rest of the matrix. è Speedup: RQD: 2 T+Lmax-1

Our Results: Speedup-RQD tradeoff Speedup 2 Lmax S=4+ (N(Lmax-1))/T RQD = 2 T+Lmax-1 Next…

Our Results: Speedup-RQD tradeoff Speedup 2 Lmax S=4+ (N(Lmax-1))/T RQD = 2 T+Lmax-1 Next… 4 2 RQD

Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition

Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ

Emulation w/ S 2 - Framing Keep a separate demand matrix for every possible

Emulation w/ S 2 - Framing Keep a separate demand matrix for every possible packet size n Example: Possible packets sizes are 3, 4, 6 n # of size 3 packets # of size 4 packets # of size 6 packets

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets of size k=LCM(1, …, Lmax) n Leftover matrix for each size m n Mega size Packets 3 (of size 12) size 4 size 6

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets of size k=LCM(1, …, Lmax) n Leftover matrix for each size m n Mega Packets (of size k=12) size 3 size 4 size 6

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets of size k=LCM(1, …, Lmax) n Leftover matrix for each size m n Mega Packets (of size 12) size 3 size 4 (leftovers) size 6

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets of size k=LCM(1, …, Lmax) n Leftover matrix for each size m n Mega Packets (of size 12) size 3 size 4 size 6 (leftovers)

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets

Emulation w/ S 2 - Framing Concatenate packets of the same size into mega-packets of size k=LCM(1, …, Lmax) n Leftover matrix for each size m n Mega Packets (of size 12) size 3 size 4 size 6 (leftovers)

Emulation w/ S 2 - Framing n Sum of each row/column is bounded ¨

Emulation w/ S 2 - Framing n Sum of each row/column is bounded ¨ For mega packets matrix: ≤ (2 T+N(Lmax-1))/k ¨ For each leftover matrix of size m: ≤ N(k -1)/m < 12/3 Mega Packets (of size 12) < 12/4 < 12/6 size 3 size 4 size 6 (leftovers)

Emulation w/ S 2 - Decomposition n Optimally decompose (w/ Birkhoff von. Neumann) the

Emulation w/ S 2 - Decomposition n Optimally decompose (w/ Birkhoff von. Neumann) the mega-packets matrix and then the leftover matrices Hold each permutation k times for contiguous (mega) -packet delivery Bound on the megapackets matrix

Our Results: Speedup-RQD tradeoff Speedup 2 Lmax S=4+ (N(Lmax-1))/T RQD = 2 T+Lmax-1 4

Our Results: Speedup-RQD tradeoff Speedup 2 Lmax S=4+ (N(Lmax-1))/T RQD = 2 T+Lmax-1 4 S=2+(Nk. Lmax-1)/T RQD = 2 T+Lmax-1 2 RQD

Wrap-up Packet-mode scheduling can be done with the same speedup as cell-mode scheduling G

Wrap-up Packet-mode scheduling can be done with the same speedup as cell-mode scheduling G With the price of bounded RQD FFuture work: lower bounds ? ?

Thank You!

Thank You!