Scheduling algorithms for inputqueued IP routers Emilio Leonardi

  • Slides: 146
Download presentation
Scheduling algorithms for input-queued IP routers Emilio Leonardi in collaboration with: P. Giaccone, M.

Scheduling algorithms for input-queued IP routers Emilio Leonardi in collaboration with: P. Giaccone, M. Ajmone Marsan, A Bianco, M. Mellia, F. Neri Dipartimento di Elettronica Telecommunication Network Group http: //www. tlc-networks. polito. it Politecnico di Torino (Italy) Adonet Spring School Budapest, March 2006 1

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 2

Note The slides marked RWP are reproduced with permission of Prof. Nick Mc. Keown

Note The slides marked RWP are reproduced with permission of Prof. Nick Mc. Keown from the Electrical Engineering and Computer Science Dept. of Stanford University (CA, USA) Adonet Spring School 3

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 4

“The Internet is a mesh of routers” core router access router Adonet Spring School

“The Internet is a mesh of routers” core router access router Adonet Spring School enterprise router 5

“The Internet is a mesh of routers” Access router: Ø high number of ports

“The Internet is a mesh of routers” Access router: Ø high number of ports at low speed (kbps/Mbps) Ø several access protocols (modem, ADSL, cable) Enterprise router: Ø medium number of ports at high speed (Mbps) Ø several services (IP classification, filtering) Core router: Ø moderate number of ports at very high speed (Mbps/Gbps) Ø very high throughput Adonet Spring School 6

Basic functions Ø Routing § computation of the output port of an incoming packet

Basic functions Ø Routing § computation of the output port of an incoming packet § uses the routing tables computed by the routing protocols § can be a complex procedure: • very large routing tables • dynamic variation of routes in the Internet Adonet Spring School 7

Basic functions Ø Switching § transfer of packets from input ports to output ports

Basic functions Ø Switching § transfer of packets from input ports to output ports § solution of the contentions for output ports • queueing – where to store • scheduling – what to transfer Adonet Spring School 8

Faster and faster Ø Need for high performance routers § to accommodate the bandwidth

Faster and faster Ø Need for high performance routers § to accommodate the bandwidth demands for new users and new services § to support Qo. S § to reduce costs Adonet Spring School 9

Packet processing and link speed Ø Increase of electronic packet processing power cannot accommodate

Packet processing and link speed Ø Increase of electronic packet processing power cannot accommodate the increase in link speed Packet processing Power Link Speed 10000 1000 2 x / 7 months 100 10 1 1985 1990 1995 Fiber Capacity (Gbit/s) Moore’s law 2 x / 18 months ? 2000 0, 1 TDM DWDM Source: SPEC 95 Int & David Miller, Stanford. Adonet Spring School RWP 10

Memory access time 1. 1 x / 18 months Moore’s Law 2 x /

Memory access time 1. 1 x / 18 months Moore’s Law 2 x / 18 months Adonet Spring School RWP 11

Moore’s law It’s hard to keep up with Moore’s law: § the bottleneck is

Moore’s law It’s hard to keep up with Moore’s law: § the bottleneck is memory speed Moore’s law is too slow: § routers need to improve faster than Moore’s law Adonet Spring School RWP 12

Router capacity exceeds Moore’s law Growth in capacity of commercial routers: § § §

Router capacity exceeds Moore’s law Growth in capacity of commercial routers: § § § 1992 ~ 2 Gb/s 1995 ~ 10 Gb/s 1998 ~ 40 Gb/s 2001 ~ 160 Gb/s 2003 ~ 640 Gb/s Average growth rate: 2. 2 x / 18 months Adonet Spring School RWP 13

Single packet processing Ø The time to process one packet is becoming shorter and

Single packet processing Ø The time to process one packet is becoming shorter and shorter § worst case: 40 -Byte packets (ACKs) travelling over the Internet • 3. 2 s at 100 Mbps • 320 ns at 1 Gps • 32 ns at 10 Gps • 3. 2 ns at 100 Gbps • 320 ps at 1 Tbps Adonet Spring School 14

Hardware architecture physical structure Adonet Spring School logical structure 15

Hardware architecture physical structure Adonet Spring School logical structure 15

Hardware architecture Main elements Ø line cards § § § § Ø support input/output

Hardware architecture Main elements Ø line cards § § § § Ø support input/output transmissions store packets adapt packets to the internal format of the switching fabric support data link protocols classify packets schedule packets support security switching fabric § transfers packets from input ports to output ports Adonet Spring School 16

Hardware architecture Main elements Ø control processor/network processor § runs routing protocols § computes

Hardware architecture Main elements Ø control processor/network processor § runs routing protocols § computes routing tables § manages the overall system Ø forwarding engines § compute the packet destination (lookup) § inspect packet headers § rewrite packet headers Adonet Spring School 17

Interconnections among main elements - I control forwarding processor engine switching fabric line card

Interconnections among main elements - I control forwarding processor engine switching fabric line card 1 Adonet Spring School line card N 18

Interconnections among main elements - II control processor switching fabric line card & forwarding

Interconnections among main elements - II control processor switching fabric line card & forwarding engine 1 Adonet Spring School N 19

Cell-based routers packets ISM cells Cell switch (fabric) cells 1 ORM packets 1 ISM

Cell-based routers packets ISM cells Cell switch (fabric) cells 1 ORM packets 1 ISM N ORM N Ø Ø packet: variable-size data unit cell: fixed-size data unit Adonet Spring School Ø Ø ISM: Input-Segmentation Module ORM: Output-Reassembly Module 20

Switching fabric Ø Our assumptions: § bufferless • to reduce internal hardware complexity §

Switching fabric Ø Our assumptions: § bufferless • to reduce internal hardware complexity § non-blocking • it is always possible to transfer in parallel from input to output ports any non-conflicting set of cells Adonet Spring School 21

Ø Examples: § § Ø inputs Switching fabric 1 2 3 4 crossbar rearrangeable

Ø Examples: § § Ø inputs Switching fabric 1 2 3 4 crossbar rearrangeable Clos network 1 2 3 4 Benes network outputs Batcher-Banyan network (self-routing) Switching constraints § at most one cell for each input and for each output can be transferred Adonet Spring School 22

Switching fabric Ø We do not discuss switching fabrics with internal buffers § e.

Switching fabric Ø We do not discuss switching fabrics with internal buffers § e. g. : crossbars with buffer at each crosspoint Adonet Spring School 23

Generic switching architecture Input 1 Sin Sout Output 1 switching fabric Input N Sin

Generic switching architecture Input 1 Sin Sout Output 1 switching fabric Input N Sin input queues Adonet Spring School Sout Output N output queues 24

Speedup Ø The speedup determinates the switch performance: § Sin = reading speed from

Speedup Ø The speedup determinates the switch performance: § Sin = reading speed from input queues § Sout = writing speed to output queues Ø maximum speedup factor: S = max(Sin, Sout) Adonet Spring School 25

Performance comparison Ø The performance of different switching systems can be studied § with

Performance comparison Ø The performance of different switching systems can be studied § with analytical models • introducing simplifying assumptions, but obtaining general results § with simulation models • obtaining more detailed results Adonet Spring School 26

Traffic description Ø Ø Ø Aij(n) = 1 if a packet arrives at time

Traffic description Ø Ø Ø Aij(n) = 1 if a packet arrives at time n at input i, with destination reachable through output j ij = E[Aij(n)] An arrival process is admissible if: § i ij 1 § j ij 1 • that is, no input and no output are overloaded on average • note that OQ switches exhibit finite delays only for admissible traffic Ø traffic matrix: = [ ij ] Adonet Spring School 27

Traffic scenarios Ø Uniform traffic § Bernoulli i. i. d. arrivals § usual testbed

Traffic scenarios Ø Uniform traffic § Bernoulli i. i. d. arrivals § usual testbed in the literature • “easy to schedule” Ø Diagonal traffic § Bernoulli i. i. d arrivals § critical to schedule, since only two matchings are good Adonet Spring School 28

Traffic scenarios Ø Log. Diagonal traffic § Bernoulli i. i. d. arrivals § more

Traffic scenarios Ø Log. Diagonal traffic § Bernoulli i. i. d. arrivals § more critical than uniform, less than diagonal traffic Adonet Spring School 29

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 30

Output Queued (OQ) switches Ø Ø Sin = 1 Sout = N used for

Output Queued (OQ) switches Ø Ø Sin = 1 Sout = N used for low bandwidth routers § no coordination among ports § work-conserving • best average delays § complete control of delays • support of Qo. S scheduling Adonet Spring School 31

Output Queued (OQ) switch speedup N Output 1 Input 1 switching fabric Input N

Output Queued (OQ) switch speedup N Output 1 Input 1 switching fabric Input N Adonet Spring School Output N 32

OQ performance Uniform traffic Note: OQ is optimal from the point of view of

OQ performance Uniform traffic Note: OQ is optimal from the point of view of average delay and throughput Adonet Spring School OQ 33

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 34

Simple Input Queued (IQ) switches Ø Ø Ø Sin = 1 Sout = 1

Simple Input Queued (IQ) switches Ø Ø Ø Sin = 1 Sout = 1 1 FIFO queue for each input port throughput limitations § due to head of the line (HOL) blocking Ø scheduling § to solve contentions for the same output Input 1 switching fabric Input 1 Adonet Spring School Output 1 Output N 35

Head of the Line (HOL) Blocking Adonet Spring School RWP 36

Head of the Line (HOL) Blocking Adonet Spring School RWP 36

Simple IQ switch performance Uniform traffic Simple IQ Adonet Spring School OQ 37

Simple IQ switch performance Uniform traffic Simple IQ Adonet Spring School OQ 37

Improving simple IQ switches Ø Window/bypass schedulers § the first w cells of each

Improving simple IQ switches Ø Window/bypass schedulers § the first w cells of each queue contend for outputs § HOL blocking is reduced, not eliminated § w = 1 means FIFO at each input § higher complexity • the scheduler deals with w. N cells • non-FIFO queues Adonet Spring School 38

Improving IQ switches Ø Virtual output queueing (VOQ) § one queue for each input/output

Improving IQ switches Ø Virtual output queueing (VOQ) § one queue for each input/output pair • N queues at each input • N 2 queues in the whole switch § eliminates HOL blocking § used in high-bandwidth routers • scheduling implemented in hardware at very high speed Adonet Spring School 39

IQ switches with VOQ input constraints Input 1 1 Output 1 N Input N

IQ switches with VOQ input constraints Input 1 1 Output 1 N Input N output constraints switching fabric 1 Output N N scheduler Adonet Spring School Note: from now on, we always assume VOQ at the switch inputs 40

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 41

Scheduling in IQ switches Ø Scheduling can be modeled as a matching problem in

Scheduling in IQ switches Ø Scheduling can be modeled as a matching problem in a bipartite graph § the edge from node i to node j refers to packets at input i and directed to output j § the weight of the edge can be • binary (not empty/empty queue) • queue length • HOL cell waiting time, or cell age • some other metric indicating the priority of the HOL cell to be served Adonet Spring School 42

Scheduling in IQ switches Request Graph inputs Adonet Spring School Matching (or Permutation) outputs

Scheduling in IQ switches Request Graph inputs Adonet Spring School Matching (or Permutation) outputs scheduler 43

Scheduling in IQ switches Request Matrix 3 2 4 0 5 0 0 8

Scheduling in IQ switches Request Matrix 3 2 4 0 5 0 0 8 Permutation 0 4 0 2 0 1 0 0 0 0 0 1 0 scheduler Adonet Spring School 44

Implementing schedulers Ø Scheduling is a complex task § a scheduling algorithm can be

Implementing schedulers Ø Scheduling is a complex task § a scheduling algorithm can be implemented in hardware if: • it shows good performance for a wide range of traffic patterns • it can be efficiently parallelized • it can be efficiently pipelined • it requires few iterations (or clock cycles) • it requires limited control information Adonet Spring School 45

Scheduling uniform traffic Ø A number of algorithms give 100% throughput when traffic is

Scheduling uniform traffic Ø A number of algorithms give 100% throughput when traffic is uniform § For example: • TDM and a few variants • i. SLIP (see later) Example of TDM for a 4 x 4 switch Adonet Spring School RWP 46

Birkhoff - von Neumann theorem Any doubly stochastic matrix can be expressed as convex

Birkhoff - von Neumann theorem Any doubly stochastic matrix can be expressed as convex combination of permutation matrices pn: = n an p n with an≥ 0 n an =1 Adonet Spring School 47

Scheduling non-uniform traffic Ø Ø thanks to the Birkhoff - von Neumann theorem If

Scheduling non-uniform traffic Ø Ø thanks to the Birkhoff - von Neumann theorem If the traffic is known and admissible, 100% throughput can be achieved by a TDM using: § for a fraction of time a 1 matching M 1 (p 1) § for a fraction of time a 2 matching M 2 (p 2) § for a fraction of time ak matching Mk (p 3) Adonet Spring School 48

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 49

Maximum Size Matching Ø Maximum Size Matching (MSM) § among all the possible matchings,

Maximum Size Matching Ø Maximum Size Matching (MSM) § among all the possible matchings, selects the one with the highest number of edges • MSM is generally not unique § the best MSM algorithm requires O(N 2. 5) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm Adonet Spring School 50

Instability of MSM Ø Assume: § P(arrival at Q 12) = § P(arrival at

Instability of MSM Ø Assume: § P(arrival at Q 12) = § P(arrival at Q 11) = P(arrival at Q 22) = 1 - - § Q 12 = B » 0 Q 11 = Q 22 = 0 § in case of parity serve Q 11 and/or Q 22 instead of Q 12 Ø Observe: § Q 12 is served only when A 11 = 0 and A 22 = 0, i. e. with probability: P(serve Q 12) = P(no arrivals at both Q 11 and Q 22 ) = [1 -(1 - - )]2 = ( + )2 § P(serve Q 12) < P(arrival at Q 12) if is small enough § Example: = 0. 5; = 0. 1; 1 - - P(serve Q 12) = 0. 36 In 1 Out 1 Note: this proof is due to I. Keslassy, Stanford Univ. Adonet Spring School In 2 1 - - Out 2 51

Maximum Size Matching Ø Ø MSM maximizes the instantaneous throughput MSM may not yield

Maximum Size Matching Ø Ø MSM maximizes the instantaneous throughput MSM may not yield 100% throughput § short term decisions can be inefficient in the long term § non-binary edge weights allow MWM to maximize the long-term throughput Adonet Spring School 52

Maximum Weight Matching Ø Maximum Weight Matching (MWM) § among all the possible N!

Maximum Weight Matching Ø Maximum Weight Matching (MWM) § among all the possible N! matchings, selects the one with the highest weight (sum of the edge metrics) • MWM is generally not unique § MWM is too complex to be implemented in hardware at high speed • the best MWM algorithm requires O(N 3) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm • cannot be parallelized and pipelined efficiently § MWM has never been implemented in a commercial chipset Adonet Spring School 53

Maximum Weight Matching Ø In case of unknown traffic, MWM is the optimal solution

Maximum Weight Matching Ø In case of unknown traffic, MWM is the optimal solution of the scheduling problem when the weight is either the queue length or the cell age § achieves 100% throughput under any traffic • also under non-Bernoulli arrival processes, satisfying the law of large numbers § achieves low average delays, very close to those of OQ switches § possible starvation for lightly loaded packet flows Adonet Spring School 54

Maximum Weight Matching Ø MWM is the optimal solution of the scheduling problem when

Maximum Weight Matching Ø MWM is the optimal solution of the scheduling problem when the traffic is unknown, when the weight is either the queue length or the cell age § achieves 100% throughput under any traffic • also under non-Bernoulli arrival processes, satisfying the law of large numbers § achieves low average delays, very close to those of OQ switches § possible starvation for lightly loaded packet flows Adonet Spring School 55

MWM with pipeline and latency Ø Ø Ø Let T and P be fixed

MWM with pipeline and latency Ø Ø Ø Let T and P be fixed Dt denotes the matching used at time t The following variations of MWM also achieve 100% throughput: § Dt = MWM(t-P) § Dt = MWM(ceil(t/T) • T) § combinations of both Ø MWM with pipeline degree P MWM with latency T thus, it seems easy to achieve 100% throughput! Adonet Spring School 56

MWM with pipeline and latency Ø Bit: § What about throughput? • 100% throughput

MWM with pipeline and latency Ø Bit: § What about throughput? • 100% throughput – but needs the computation of a MWM … § What about delays? • delays can be really bad! Adonet Spring School 57

General consideration Ø When scheduling in IQ switches, it is very difficult to achieve

General consideration Ø When scheduling in IQ switches, it is very difficult to achieve simultaneously § high throughput § low delay § limited implementation complexity Adonet Spring School 58

Uniform traffic Ø MWM and MSM behave almost identically Uniform Traffic Mean delay 100

Uniform traffic Ø MWM and MSM behave almost identically Uniform Traffic Mean delay 100 MWM MSM 10 1 0. 1 Adonet Spring School 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load 0. 7 0. 8 0. 9 1. 0 59

Log. Diagonal traffic Ø MSM is somewhat inferior to MWM Log. Diagonal Traffic 1000

Log. Diagonal traffic Ø MSM is somewhat inferior to MWM Log. Diagonal Traffic 1000 MWM MSM Mean delay 100 10 1 0. 1 Adonet Spring School 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load 0. 7 0. 8 0. 9 1. 0 60

Diagonal traffic Ø MSM yields much longer delays than MWM at medium/high loads Diagonal

Diagonal traffic Ø MSM yields much longer delays than MWM at medium/high loads Diagonal Traffic 1000 MWM MSM Mean delay 100 10 1 0. 1 Adonet Spring School 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load 0. 7 0. 8 0. 9 1. 0 61

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 62

Approximations of MSM and MWM Ø Motivation § strong interest in scheduling algorithms with

Approximations of MSM and MWM Ø Motivation § strong interest in scheduling algorithms with • very low complexity • high performance Ø Usually § implementable schedulers (low complexity) low throughput, long delays § theoretical schedulers (high complexity) high throughput, short delays Adonet Spring School 63

Some implementable algorithms Ø Approximate MSM § WFA, i. SLIP, 2 DRR, RC, FIRM

Some implementable algorithms Ø Approximate MSM § WFA, i. SLIP, 2 DRR, RC, FIRM and many others Ø Approximate MWM with wij = Xij (queue length) § i. LQF, RPA, learning algorithms Ø Approximate MWM with wij = cell age § i. OCF Ø Approximate MWM with wij = i Xij+ j Xij § i. LPF, MUCS Adonet Spring School 64

APPROXIMATIONS OF MAXIMUM SIZE MATCHING Adonet Spring School 65

APPROXIMATIONS OF MAXIMUM SIZE MATCHING Adonet Spring School 65

Wave Front Arbiter Requests Match 1 1 2 2 3 3 4 4 Adonet

Wave Front Arbiter Requests Match 1 1 2 2 3 3 4 4 Adonet Spring School RWP 66

Wave Front Arbiter 2 N-1 steps Requests Adonet Spring School Match RWP 67

Wave Front Arbiter 2 N-1 steps Requests Adonet Spring School Match RWP 67

Wrapped Wave Front Arbiter N steps instead of 2 N-1 Requests Adonet Spring School

Wrapped Wave Front Arbiter N steps instead of 2 N-1 Requests Adonet Spring School Match RWP 68

i. SLIP Ø Ø i. SLIP means “iterative SLIP” iterates among the following 3

i. SLIP Ø Ø i. SLIP means “iterative SLIP” iterates among the following 3 phases § Request § Grant § Accept Adonet Spring School 69

i. SLIP Ø 3 phases: § Request (from inputs to outputs) • each unmatched

i. SLIP Ø 3 phases: § Request (from inputs to outputs) • each unmatched input sends a request to every output for which it has a cell § Grant (from outputs to inputs) • if an unmatched output receives requests, it sends a grant to one of the inputs – contentions solved by a round-robin mechanism § Accept (from inputs to outputs) • if an unmatched input receives grants, it selects a single output and it becomes matched to it – contentions solved by a round-robin mechanism Adonet Spring School 70

i. SLIP Ø The round robin mechanism in i. SLIP is designed so that,

i. SLIP Ø The round robin mechanism in i. SLIP is designed so that, under uniform traffic, i. SLIP emulates a dynamic TDM scheduler synchronized on the arrival pattern Adonet Spring School 71

i. SLIP Ø i. SLIP is maximal • often, with log N iterations •

i. SLIP Ø i. SLIP is maximal • often, with log N iterations • always, with N iterations Ø i. SLIP was implemented on one chip in the Cisco 12000 router § http: //www. cisco. com/warp/public/cc/pd/rt/12000/tech/fasts_wp. pdf Adonet Spring School 72

i. SLIP demo from: http: //tiny-tera. stanford. edu/tiny-tera/demos/index. html Adonet Spring School 73

i. SLIP demo from: http: //tiny-tera. stanford. edu/tiny-tera/demos/index. html Adonet Spring School 73

APPROXIMATIONS OF MAXIMUM WEIGHT MATCHING Adonet Spring School 74

APPROXIMATIONS OF MAXIMUM WEIGHT MATCHING Adonet Spring School 74

i. LQF Ø Ø i. LQF means “iterative Longest Queue First” iterates among the

i. LQF Ø Ø i. LQF means “iterative Longest Queue First” iterates among the following 3 phases § Request § Grant § Accept Adonet Spring School 75

i. LQF Ø 3 phases: § Request (from inputs to outputs) • each unmatched

i. LQF Ø 3 phases: § Request (from inputs to outputs) • each unmatched input sends all its queue lengths as requests to corresponding outputs § Grant (from outputs to inputs) • if an unmatched output receives requests, it sends a grant to the input corresponding to the longest queue – contentions solved by random choice § Accept (from inputs to outputs) • if an unmatched input receives grants, it selects the output with the longest queue – contentions solved by random choice Adonet Spring School 76

i. LQF Ø i. LQF is maximal • often, with log N iterations •

i. LQF Ø i. LQF is maximal • often, with log N iterations • always, with N iterations Ø i. LQF is robust to non-uniform traffic Adonet Spring School 77

i. LQF demo from: http: //tiny-tera. stanford. edu/tiny-tera/demos/index. html Adonet Spring School 78

i. LQF demo from: http: //tiny-tera. stanford. edu/tiny-tera/demos/index. html Adonet Spring School 78

RPA Ø Ø RPA means “Reservation with Preemption and Acknowledgment” Two phases § Reservation

RPA Ø Ø RPA means “Reservation with Preemption and Acknowledgment” Two phases § Reservation (possibly preemptive) § Acknowledgement Ø Sequential accesses to a reservation vector § Urgj (if set) is the urgency of the transfer from input Inj to output j Vector Res Urg 1, In 1 Urg 2, In 2 Urg 3, In 3 Urg. N, In. N Out 1 Out 2 Out 3 Out N Adonet Spring School 79

RPA Input 1 Ø Input 2 Vector Res is sequentially accessed by all inputs

RPA Input 1 Ø Input 2 Vector Res is sequentially accessed by all inputs Res Input 4 Adonet Spring School Input 3 80

RPA Initially, at each round: Urgj = 0 for all j Reservation phase Ø

RPA Initially, at each round: Urgj = 0 for all j Reservation phase Ø when input i accesses Res § it computes Wj = Xij – Urgj for all j § finds j* such that Wj* = max{ Wj } § if Wj* > 0, reserve output j* and set Urgj*=Xij*, possibly overwriting the previous reservation § otherwise, leave the current reservation Adonet Spring School 81

RPA Ø Acknowledgement phase § if input i still finds its reservation at output

RPA Ø Acknowledgement phase § if input i still finds its reservation at output j, books output j § otherwise, chooses an unreserved output j and books output j Adonet Spring School 82

Uniform traffic Ø comparison between MWM, i. SLIP, i. LQF, and RPA Uniform Traffic

Uniform traffic Ø comparison between MWM, i. SLIP, i. LQF, and RPA Uniform Traffic 1000 MWM i. SLIP i. LQF RPA Mean delay 100 10 1 0. 1 Adonet Spring School 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load 0. 7 0. 8 0. 9 1. 0 83

Log. Diagonal traffic Ø i. SLIP saturates close to 84% throughput Log. Diagonal Traffic

Log. Diagonal traffic Ø i. SLIP saturates close to 84% throughput Log. Diagonal Traffic 100000 MWM i. SLIP i. LQF RPA Mean delay 10000 100 10 1 Adonet Spring School 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load 0. 7 0. 8 0. 9 1. 0 84

Diagonal traffic Ø RPA achieves 98% throughput, i. LQF 87%, i. SLIP 83% Diagonal

Diagonal traffic Ø RPA achieves 98% throughput, i. LQF 87%, i. SLIP 83% Diagonal Traffic 100000 MWM i. SLIP i. LQF RPA Mean delay 10000 100 10 1 Adonet Spring School 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load 0. 7 0. 8 0. 9 1. 0 85

LEARNING ALGORITMS Adonet Spring School 86

LEARNING ALGORITMS Adonet Spring School 86

Learning algorithms Ø Goal: find a good compromise among throughput, delay and complexity Adonet

Learning algorithms Ø Goal: find a good compromise among throughput, delay and complexity Adonet Spring School 87

Learning algorithms Ø Key observation § the matchings generated by MWM show limited changes

Learning algorithms Ø Key observation § the matchings generated by MWM show limited changes from one time to another • remembering the matching from the past simplifies the computation of the new matching § the search implemented by MWM can be enhanced • with a randomized approach • by observing arrivals • by searching in parallel Ø based on an extension of randomized scheduling algorithms Adonet Spring School 88

Simple Randomized Schemes Ø Choose a matching at random and use it as the

Simple Randomized Schemes Ø Choose a matching at random and use it as the schedule § doesn’t yield 100% throughput Ø Ø Ø Choose 2 matchings at random and use the heavier one as the schedule … Choose N matchings at random and use the heaviest one as the schedule None of these can give 100% throughput ! Adonet Spring School 89

Simple randomized algorithms 32 x 32 Adonet Spring School 90

Simple randomized algorithms 32 x 32 Adonet Spring School 90

Bounds on Maximum Throughput Adonet Spring School 91

Bounds on Maximum Throughput Adonet Spring School 91

Tassiulas’ scheme Ø Consider the following policy § Rt = matching picked at random

Tassiulas’ scheme Ø Consider the following policy § Rt = matching picked at random (uniformly) among all the possible N! matchings § Dt = arg max { W(Dt-1), W(Rt) } Ø Complexity is very low § O(1) iterations § easy to pipeline Ø Yields 100% throughput ! § note the boost in throughput is due to memory of the past matching Dt-1 Ø However, delays are very large Adonet Spring School 92

Tassiulas' scheme 32 x 32 Adonet Spring School 93

Tassiulas' scheme 32 x 32 Adonet Spring School 93

Learning approach Dt-1 Ø § W(Dt) W(Dt-1) § W(Dt) W(Mt) Mt COMP 1 Dt

Learning approach Dt-1 Ø § W(Dt) W(Dt-1) § W(Dt) W(Mt) Mt COMP 1 Dt Adonet Spring School Properties of COMP 1 Ø Examples: § COMP 1 is the MAX among Dt-1 and Mt § COMP 1 is the MERGE among Dt-1 and Mt 94

MERGE procedure 3 1 2 2 3 3 2 4 2 Merging X W(X)=12

MERGE procedure 3 1 2 2 3 3 2 4 2 Merging X W(X)=12 3 1 3 -1+2 -2=2 2 R W(R)=10 3 Emulating MWM is O(N) 3 2 -1+2 -4=-1 1 M Adonet Spring School W(M)=13 95

The learning approach Ø Dt-1 § informally, Mt should be a “good” sample in

The learning approach Ø Dt-1 § informally, Mt should be a “good” sample in the space of all possible matchings Mt COMP 1 Dt Adonet Spring School Properties of Mt Ø Examples: § Mt is a matching picked uniformly at random § Mt is a matching picked non-uniformly at random, with a high probability of being heavy § Mt is derived from the arrival vector At § Mt is a good “neighbor” of Dt-1 96

Theoretical properties Dt-1 Ø § 100% throughput under any admissible Bernoulli traffic pattern Mt

Theoretical properties Dt-1 Ø § 100% throughput under any admissible Bernoulli traffic pattern Mt COMP 1 Dt Adonet Spring School Stability Ø Delay § the better is the weight of Mt , the smaller are the queue lengths, and hence the smaller are the delays 97

Example of practical implementation Ø Exploiting parallel search: K-th neighbor of Dt-1 N 1

Example of practical implementation Ø Exploiting parallel search: K-th neighbor of Dt-1 N 1 NK Dt-1 Mt MAX Adonet Spring School Dt Ø This scheme is called APSARA 98

What is a “neighbor” of a matching? • Example: 3 x 3 switch Dt-1

What is a “neighbor” of a matching? • Example: 3 x 3 switch Dt-1 3 neighbors N 1 N 2 N 3 • Each neighbor – differs from Dt-1 in ONLY TWO edges – can be generated very easily in hardware Adonet Spring School 99

Max-APSARA Ø Ø APSARA, as described before, is not maximal Max-APSARA is a modified

Max-APSARA Ø Ø APSARA, as described before, is not maximal Max-APSARA is a modified version of APSARA where a maximal size matching algorithm runs on the remaining unmatched inputs/outputs § e. g. , if k inputs/outputs are unmatched, • run i. SLIP with k iterations • select k random edges among the unmatched inputs/outputs Adonet Spring School 100

APSARA performance Adonet Spring School 101

APSARA performance Adonet Spring School 101

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 102

Routers and switches Ø Ø IP routers deal with variable-size packets Hardware switching fabrics

Routers and switches Ø Ø IP routers deal with variable-size packets Hardware switching fabrics often deal with fixed-size cells Question: § how to integrate an hardware switching fabric within an IP router? Adonet Spring School 103

Router based on an IQ cell switch: cell-mode 1 ISM N ISM Adonet Spring

Router based on an IQ cell switch: cell-mode 1 ISM N ISM Adonet Spring School IQ cell switching fabric 1 N ORM 104

Cell-mode scheduling Ø Scheduling algorithms work at cell level § pros: • 100% throughput

Cell-mode scheduling Ø Scheduling algorithms work at cell level § pros: • 100% throughput achievable § cons: • interleaving of packets at the outputs of the switching fabric Adonet Spring School 105

Router based on an IQ cell switch: packet-mode NO packet interleaving if packet-mode 1

Router based on an IQ cell switch: packet-mode NO packet interleaving if packet-mode 1 ISM N ISM Adonet Spring School IQ cell switching fabric 1 N ORM 106

Router based on an IQ cell switch: packet-mode NO packet interleaving if packet-mode 1

Router based on an IQ cell switch: packet-mode NO packet interleaving if packet-mode 1 ISM IQ cell switch 1 ORMs can be removed N ISM Adonet Spring School switching fabric N ORM 107

Packet-mode scheduling Ø Rule: packets transferred as trains of cells § when an input

Packet-mode scheduling Ø Rule: packets transferred as trains of cells § when an input starts transferring the first cell of a packet comprising k cells, it continues to transfer in the following k-1 time slots Ø Pros: § no interleaving of packets at the outputs § easy extension of traditional schedulers Ø Cons: § starvation due to long packets • inherent in packet systems without preemption • negligible for high speed rates Adonet Spring School 108

Packet-mode scheduling Ø Questions § can packet mode provide high throughput? YES! § what

Packet-mode scheduling Ø Questions § can packet mode provide high throughput? YES! § what about delays? It depends… Adonet Spring School 109

Packet-mode properties Ø Main theoretical results § MWM in packet-mode yields 100% throughput §

Packet-mode properties Ø Main theoretical results § MWM in packet-mode yields 100% throughput § Packet mode can provide shorter delays than cell mode, depending on the packet length distribution Adonet Spring School 110

Simulation scenario Ø Ø Router with ISMs and ORMs Uniform packet traffic § uniform

Simulation scenario Ø Ø Router with ISMs and ORMs Uniform packet traffic § uniform packet load § uniform (1, 192) packet size distribution Ø Spotted packet traffic § non uniform packet load § bimodal (3, 100) packet size distribution Adonet Spring School P = 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1 1 0 1 0 1 1 1 0 1 111

Uniform packet traffic Ø Packet mode and cell mode reach the same throughput Uniform

Uniform packet traffic Ø Packet mode and cell mode reach the same throughput Uniform packet traffic for cell mode Uniform packet traffic for packet mode MWM 100000 MSM i. SLIP i. LQF 10000 Mean packet delay 100000 100 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized Load Cell-mode Adonet Spring School 0. 7 0. 8 0. 9 1. 0 1000 100 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0 Normalized Load Packet-mode 112

Spotted packet traffic Ø Packet mode reaches higher throughput than cell mode Spotted packet

Spotted packet traffic Ø Packet mode reaches higher throughput than cell mode Spotted packet traffic for packet mode 100000 MWM MSM i. SLIP i. LQF 10000 Mean packet delay Spotted packet traffic for cell mode 1000 100 0. 5 0. 6 0. 7 0. 8 Normalized Load Cell-mode Adonet Spring School 0. 9 1. 0 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0 Normalized Load Packet-mode 113

Effect of packet size distribution Ø i. SLIP delay. CM/delay. PM for different packet

Effect of packet size distribution Ø i. SLIP delay. CM/delay. PM for different packet size distributions 2 better CM Packet mode gain for i. SLIP better PM Uniform Exponential Trimodal Bimodal 1. 5 1 At high load PM becomes better 0. 5 0 0. 1 Adonet Spring School 0. 2 0. 3 0. 4 0. 5 0. 6 Normalized load 0. 7 0. 8 0. 9 1. 0 114

Packet mode features Ø Packet mode scheduling § is a feasible modification of schedulers

Packet mode features Ø Packet mode scheduling § is a feasible modification of schedulers § improves throughput • but it can generate some unfairness between long and short packets – inherent to all variable-packet networks without preemption § may give better packet delays than cell mode • depends on the packet size distribution Adonet Spring School 115

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 116

Network of IQ routers Ø Question: § given a network of IQ switches and

Network of IQ routers Ø Question: § given a network of IQ switches and an admissible input traffic, is the network always stable? NO! this is quite counterintuitive…but true Adonet Spring School 117

Networks of IQ routers Ø Consider the acyclic network of IQ routers in the

Networks of IQ routers Ø Consider the acyclic network of IQ routers in the following slide § derived from well established results from adversarial queueing theory § a very specific scenario, but comprises only few switches… • this situation may not be common, but cannot be excluded in real networks Adonet Spring School 118

Pathological network of IQ switches Network with 8 switches and 4 flows Adonet Spring

Pathological network of IQ switches Network with 8 switches and 4 flows Adonet Spring School 119

Instability of MWM Ø If MWM is adopted at each IQ router, and the

Instability of MWM Ø If MWM is adopted at each IQ router, and the traffic is admissible, the system can be unstable under Bernoulli i. i. d. arrivals Adonet Spring School 120

Instability of MWM Ø Ø MWM is too greedy, in the sense that it

Instability of MWM Ø Ø MWM is too greedy, in the sense that it can create traffic bursts that are amplified by each scheduler A server can be idling when large bursts (directed to it) are blocked because of the contentions upstream § the problem arises when a packet flow is subject to priority changes along its path through the network • it is “dangerous” to increase priority along the path Adonet Spring School 121

Stability in networks of routers Ø Global policies § “Oldest in the network” and

Stability in networks of routers Ø Global policies § “Oldest in the network” and many others • problem: requires global information about the network, and perfectly synchronized clocks at the ingress of the network Ø Local policies § until now, nothing really satisfying known … (work in progress) Adonet Spring School 122

Stability in networks of routers Ø Semi-local policies § MWM with local information about

Stability in networks of routers Ø Semi-local policies § MWM with local information about the router neighbors can achieves 100% throughput under i. i. d. Bernoulli arrivals § Virtual network queue • the weights used by MWM are: – wij = max{0, Xij-H(Xij)} where H(Xij) is the size of the queue upstream which is sending packets to Xij Adonet Spring School 123

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 124

CIOQ routers S Input 1 S o 1 Output 1 switching fabric S Input

CIOQ routers S Input 1 S o 1 Output 1 switching fabric S Input N S o. N Output N VOQ Adonet Spring School 125

CIOQ routers Ø Question: § if a low speedup S is allowed (and queues

CIOQ routers Ø Question: § if a low speedup S is allowed (and queues are available at both inputs and outputs), is it possible to design simple scheduling algorithms, capable of achieving high throughput and low delay? YES! Adonet Spring School 126

CIOQ routers with S=2 Ø If S = 2 § it is easy to

CIOQ routers with S=2 Ø If S = 2 § it is easy to obtain 100% throughput • all maximal matchings work – based on stable marriage algorithms § it is less easy to obtain work conservation • output never idling whenever a packet is present destined to it • same average delays as OQ • very good delay performance • e. g. : LOOFA § it is difficult to perfectly emulate OQ… Adonet Spring School 127

LOOFA Ø The occupancy Cj § is the number of cells currently residing at

LOOFA Ø The occupancy Cj § is the number of cells currently residing at the j-th output queue § at each time slot, it is decremented by one because of departures Ø Basic idea of LOOFA § give priority to output channels with low occupancy, thereby attempting to maintain work -conservation for all outputs Adonet Spring School 128

LOOFA Ø If S = 2, during each of the two phases § each

LOOFA Ø If S = 2, during each of the two phases § each unmatched input selects a non-empty VOQ directed to the unmatched output with the lowest occupancy, and sends a request to that output § each unmatched output selects one of the requests, and sends a request to that input § repeat until the matching is maximal Ø the selection at the outputs can be round robin, random, . . . Adonet Spring School 129

CIOQ routers with S=2 Ø If S = 2 § it is difficult (but

CIOQ routers with S=2 Ø If S = 2 § it is difficult (but possible) to perfectly emulate an OQ router in terms of packet departures • it is impossible to distinguish, by observing arrivals and departures, if the switching architecture is CIOQ or OQ • delays are perfectly controlled – easy to implement scheduling algorithms born for OQ (eg: WFQ) Adonet Spring School 130

CIOQ routers Ø CIOQ are very promising architectures § many degrees of freedom in

CIOQ routers Ø CIOQ are very promising architectures § many degrees of freedom in design • how to balance input/output buffers • how the buffers interact – e. g. , by backpressure mechanisms Ø Ø Several currently designed architectures are supposed to be CIOQ The speedup S is becoming closer and closer to 1 in practical implementations of new switching architectures (CIOQ IQ) Adonet Spring School 131

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø

Outline Ø Ø Ø IP routers OQ routers IQ routers § § § Ø Ø Ø Adonet Spring School Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers CIOQ routers Multicast traffic Conclusions 132

Multicast traffic Misleading (but common) idea: Ø observe 1. OQ can achieve 100% throughput

Multicast traffic Misleading (but common) idea: Ø observe 1. OQ can achieve 100% throughput under any admissible unicast and multicast traffic 2. OQ can be perfectly emulated by CIOQ with S = 2 § then, with S = 2 it is possible to achieve 100% throughput for multicast traffic WRONG! because observation 2 holds only for unicast traffic Adonet Spring School 133

Multicast traffic Ø Question: § what is the minimum speedup required to achieve 100%

Multicast traffic Ø Question: § what is the minimum speedup required to achieve 100% throughput? unknown! Adonet Spring School 134

Multicast traffic Ø Possible implementations § copy network before the switching fabric • a

Multicast traffic Ø Possible implementations § copy network before the switching fabric • a multicast cell with f destinations is treated as f cells • possible bandwidth inefficiency § dedicated queue • multicast packets are treated in some specific way 1 UC UC+MC N N Adonet Spring School N N 135

Multicast traffic: optimal queueing Ø MC-VOQ queueing § best throughput performance • avoids HOL

Multicast traffic: optimal queueing Ø MC-VOQ queueing § best throughput performance • avoids HOL blocking § 2 N-1 queues for each input, one for each fanout set • re-enqueuing process out-of-sequence problem • no re-enqueuing some throughput degradation 1 MC+UC 2 N-1 Adonet Spring School N N 136

Multicast traffic: optimal scheduling Ø The optimal scheduling for multicast traffic can be defined

Multicast traffic: optimal scheduling Ø The optimal scheduling for multicast traffic can be defined similarly to unicast traffic § it is a sort of max flow algorithm on all N(2 N-1) queues Ø Many heuristics can be envisaged to approximate it Adonet Spring School 137

Summary Ø 3 main ingredients for IQ scheduling algorithms: § § § Weight computation

Summary Ø 3 main ingredients for IQ scheduling algorithms: § § § Weight computation Matching computation Contention resolution Adonet Spring School 138

Summary Ø Weight computation § obtains the priority of each input queue § the

Summary Ø Weight computation § obtains the priority of each input queue § the metric can be related to queue length, waiting time of the cell at the HOL, … Ø Contention resolution § whenever the selection is among situations with equal weights § can be round robin, or random Adonet Spring School 139

Summary Ø Matching computation § computes the matching, trying to maximize its total weight

Summary Ø Matching computation § computes the matching, trying to maximize its total weight § can be based on § an iterative search, like in i. SLIP, i. OCF, i. LQF § a matrix greedy approach, like in MUCS, WFA § a reservation vector, like in RPA § a learning approach, like in APSARA Adonet Spring School 140

Summary Ø Good IQ scheduling algorithms exist: § § § Ø 100% throughput short

Summary Ø Good IQ scheduling algorithms exist: § § § Ø 100% throughput short delay limited complexity Performance differences are significant only close to saturation Adonet Spring School 141

Summary Ø Open questions concerning IQ schedulers: § § § Qo. S guarantees stability

Summary Ø Open questions concerning IQ schedulers: § § § Qo. S guarantees stability of networks of switches multicast traffic Adonet Spring School 142

References Router functions and architectures § Keshav S. , Sharma R. , ``Issues and

References Router functions and architectures § Keshav S. , Sharma R. , ``Issues and trends in router design'', IEEE Communications Magazine, vol. 36, n. 5, May 1998, p. 144 -151 § Bux W. , Denzel W. E. , Engbersen T. , Herkersdorf A. , Luijten R. P. , ``Technologies and building blocks for fast packet forwarding'', IEEE Communications Magazine, Jan. 2001, pp. 70 -77 § Newman P. , Minshall G. , Lyon T. , Huston L. , ``IP switching and gigabit routers'', IEEE Communications Magazine, Jan. 1997, pp. 64 -69 § Wolf T. , Turner J. S. , ``Design issues for high-performance active routers'', IEEE Journal on Selected Areas in Communications, vol. 19, n. 3, Mar. 2001, pp. 404 -409 Scheduling in IQ switches § Karol M. , Hluchyj M. , Morgan S. , ``Input versus output queueing on a space division switch'', IEEE Transactions on Communications, vol. 35, n. 12, Dec. 1987 § Mc. Keown N. , Anantharam V. , Walrand J. , ``Achieving 100% throughput in an input-queued switch'', IEEE INFOCOM'96, vol. 1, San Francisco, CA, Mar. 1996, pp. 296 -302 § Mc. Keown N. , ``i. SLIP: a scheduling algorithm for input-queued switches'', IEEE Transactions on Networking, vol. 7, n. 2, Apr. 1999, pp. 188 -201 § Mc. Keown N. , Mekkittikul A. , ``A practical scheduling algorithm to achieve 100% throughput in input-queued switches'', IEEE INFOCOM'98, vol. 2, 1998, pp. 792 -9, New York, NY § Tamir Y. , Chi H. -C. , ``Symmetric crossbar arbiters for VLSI communication switches'', IEEE Transaction on Parallel and Distributed Systems, vol. 4, no. 1, Jan. 1993, pp. 13 – 27 § Chen H. , Lambert J. , Pitsilledes A. , ``RC-BB switch. A high performance switching network for B-ISDN'', GLOBECOM 95 Adonet Spring School 143

References Scheduling in IQ switches § Anderson T. , Owicki S. , Saxe J.

References Scheduling in IQ switches § Anderson T. , Owicki S. , Saxe J. , Thacker C. , ``High speed switch scheduling for local area networks'', ACM Transactions on Computer Systems, vol. 11, n. 4, Nov. 1993 § La. Maire R. O. , Serpanos D. N. , ``Two dimensional round-robin schedulers for packet switches with multiple input queues'', IEEE/ACM Transaction on Networking, vol. 2, n. 5, Oct. 1994, p. 471 -482 § Chen H. , Lambert J. , Pitsilledes A. , ``RC-BB switch. A high performance switching network for B-ISDN'', IEEE GLOBECOM 95, 1995 § Duan H. , Lockwood J. W. , Kang S. M. , Will J. D. , ``A high performance OC 12/OC 48 queue design prototype for input buffered ATM switches'', IEEE INFOCOM'97, vol. 1, 1997, pp. 20 -8, Los Alamitos, CA § Partridge C. , et al. , ``A 50 -Gb/s IP router'', IEEE Transactions on Networking, vol. 6, n. 3, June 1998, pp. 237248 § Ajmone Marsan M. , Bianco A. , Leonardi E. , Milia L. , ``RPA: a flexible scheduling algorithm for input buffered switches'', IEEE Transactions on Communications, vol. 47, n. 12, Dec. 1999, pp. 1921 -1933 § Ajmone Marsan M. , Bianco A. , Filippi E. , Giaccone P. , Leonardi E. , Neri F. , ``On the behavior of input queueing switch architectures'', European Transactions on Telecommunications, vol. 10, n. 2, Mar. 1999, pp. 111 -124 § Christensen K. J. , ``Design and evaluation of a parallel-polled virtual output queued switch'', IEEE ICC 2001, vol. 1, pp. 112 -116, 2001 § Serpanos D. N. , Antoniadis P. I. , ``FIRM: a class of distributed scheduling algorithms for high-speed ATM switches with multiple input queues'', IEEE INFOCOM 2000, vol. 2, pp. 548 -555, 2000 § Ying Jiang, Hamdi, M. , “A 2 -stage matching scheduler for a VOQ packet switch architecture”, IEEE ICC 2002, vol. 4, pp. 2105 -2110, 2002 § Tassiulas L. , ``Linear complexity algorithms for maximum throughput in radio networks and input queued switches'', IEEE INFOCOM'98, vol. 2, New York, NY, 1998, pp. 533 -539 § Giaccone Prabhakar B. , Shah D. , ``Towards simple, high-performance schedulers for high-aggregate 144 Adonet Spring P. , School bandwidth switches '', IEEE INFOCOM'02, New York, Jun. 2002

References Packet scheduling in IQ switches § Ajmone Marsan M. , Bianco A. ,

References Packet scheduling in IQ switches § Ajmone Marsan M. , Bianco A. , Giaccone P. , Leonardi E. , Neri F. , ``Packet scheduling in input-queued cellbased switches'', IEEE INFOCOM'01, Anchorage, Alaska, Apr. 2001(extended version to appear in IEEE Trans. on Networking, about Oct. 2002) § Moon S. H. , Sung D. K. , ``High-performance variable-length packet scheduling algorithm for IP traffic'', IEEE GLOBECOM'01, Dec. 2001 Scheduling multicast traffic in IQ switches § Hayes J. F. , Breault R. , Mehmet-Ali M. K. , ``Performance analysis of a multicast switch'', IEEE Transactions on Communications, vol. 39, n. 4, Apr. 1991, pp. 581 -587 § Kim C. K. , Lee T. T. , ``Call scheduling algorithm in multicast switching systems'', IEEE Transactions on Communications, vol. 40, n. 3, Mar. 1992, pp. 625 -635 § Mc. Keown N. , Prabhakar B. , ``Scheduling multicast cells in an input-queued switch'', INFOCOM'96, vol. 1, San Francisco, CA, Mar. 1996, pp. 261 -278 § Prabhakar B. , Mc. Keown N. , Ahuja R. , ``Multicast scheduling for input-queued switches'', IEEE Journal on Selected Areas in Communications, vol. 15, n. 5, Jun. 1997, pp. 855 -866 § Chen W. , Chang Y. , Hwang W. , ``A high performance cell scheduling algorithm in broadband multicast switching systems'', IEEE GLOBECOM'97, vol. 1, New York, NY, 1997, pp. 170 -174 § Guo M. , Chang R. , ``Multicast ATM switches: survey and performance evaluation'', Computer Communication Review, vol. 28, n. 2, Apr. 1998, pp. 98 -131 § Andrews M. , Khanna S. , Kumaran K. , ``Integrated scheduling of unicast and multicast traffic in an inputqueued switch'', IEEE INFOCOM'99, vol. 3, New York, NY, 1999, pp. 1144 -1151 § Liu Z. , Righter R. , ``Scheduling multicast input-queued switches'', Journal of Scheduling, John Wiley & Sons, May 1999 Adonet Spring School 145

References Scheduling multicast traffic in IQ switches § Nong G. , Hamdi M. ,

References Scheduling multicast traffic in IQ switches § Nong G. , Hamdi M. , ``On the provision of integrated Qo. S guarantees of unicast and multicast traffic in inputqueued switches'', IEEE GLOBECOM'99, vol. 3, 1999 § Ajmone Marsan M. , Bianco A. , Giaccone P. , Leonardi E. , Neri F. , ``On the throughput of input-queued cellbased switches with multicast traffic'', IEEE INFOCOM'01, Anchorage Alaska, Apr. 2001 § Ge Nong, Hamdi M. , “Providing Qo. S guarantees for unicast/multicast traffic with fixed/variable-length packets in multiple input-queued switches”, IEEE Symposium on Computers and Communications, pp. 166 – 171, 2001 § Smiljanic A. , “Flexible bandwidth allocation in high-capacity packet switches”, IEEE/ACM Transactions on Networking, vol. 10, n. 2, pp. 287 -293, Apr. 2002 Qo. S support in IQ switches § Tabatabaee V. , Georgiadis L. , Tassiulas L. , ``Qo. S provisioning and tracking fluid policies in input queueing switches'', IEEE INFOCOM'00, New York, Mar. 2000 § Chang C. S. , Lee D. S. , Jou Y. S. , ``Load balanced Birkhoff-von Neumann switches'', 2001 IEEE Workshop on High Performance Switching and Routing, 2001, pp. 276 -280. § Hung A. , Kesidis G. , Mc. Keown N. , ``ATM input-buffered switches with guaranteed-rate property'', IEEE ISCC'98, July 1998, pp. 331 -335, Athens, Greece Advanced architectures derived from pure IQ § Iyer S. , Mc. Keown N. , ``Making parallel packet switches practical'', IEEE INFOCOM'01, Alaska, Mar. 2001 § Chang C. S. , Lee D. S. , Jou Y. S. , ``Load balanced Birkhoff-von Neumann switches'', 2001 IEEE Workshop on High Performance Switching and Routing, 2001, pp. 276 -280 § Sivaram R. , Stunkel C. B. , Panda D. K. , “HIPIQS: a high-performance switch architecture using input queuing”, IEEE Transactions on Parallel and Distributed Systems, vol. 13, n. 3, pp. 275 -289, Mar. 2002 Adonet Spring School 146