Lecture 15 Busses and Networking 1 Prof Jan

Lecture 15: Busses and Networking (1) Prof. Jan Rabaey Computer Science 252, Spring 2000 Based on slides from Dave Patterson, John Kubiatowicz Bill Dally, and Sonics, Inc JR. S 00 1

A Communication-Centric World • Computation is getting distributed … – Internet, WAN, LAN, Body. LAN, Home Networks, Microprocessor Peripherals, Processor-Memory Interface, System-on-a-Chip • Efficient Networking and Communication is Crucial • The System-on-a-Chip implies the Network-on -a-Chip • In Next Set of Lectures: – Busses and Networks – But more importantly, the impact of integration JR. S 00 2

What is a bus? A Bus Is: • shared communication link • single set of wires used to connect multiple subsystems Processor Input Control Datapath Memory Output • A Bus is also a fundamental tool for composing large, complex systems – systematic means of abstraction JR. S 00 3

Busses JR. S 00 4

Advantages of Buses Processer I/O Device Memory • Versatility: – New devices can be added easily – Peripherals can be moved between computer systems that use the same bus standard • Low Cost: – A single set of wires is shared in multiple ways JR. S 00 5

Disadvantage of Buses Processor I/O Device Memory • It creates a communication bottleneck – The bandwidth of that bus can limit the maximum I/O throughput • The maximum bus speed is largely limited by: – The length of the bus – The number of devices on the bus – The need to support a range of devices with: » Widely varying latencies » Widely varying data transfer rates JR. S 00 6

General Organization of a Bus Control Lines Data Lines • Control lines: – Signal requests and acknowledgments – Indicate what type of information is on the data lines • Data lines carry information between the source and the destination: – Data and Addresses – Complex commands JR. S 00 7

Master versus Slave Master issues command Bus Master Data can go either way Bus Slave • A bus transaction includes two parts: – Issuing the command (and address) – Transferring the data – request – action • Master is the one who starts the bus transaction by: – issuing the command (and address) • Slave is the one who responds to the address by: – Sending data to the master if the master ask for data – Receiving data from the master if the master wants to send data JR. S 00 8

Types of Busses • Processor-Memory Bus (design specific) – Short and high speed – Only need to match the memory system » Maximize memory-to-processor bandwidth – Connects directly to the processor – Optimized for cache block transfers • I/O Bus (industry standard) – Usually is lengthy and slower – Need to match a wide range of I/O devices – Connects to the processor-memory bus or backplane bus • Backplane Bus (standard or proprietary) – Backplane: an interconnection structure within the chassis – Allow processors, memory, and I/O devices to coexist – Cost advantage: one bus for all components JR. S 00 9

Example: Pentium System Organization Processor/Memory Bus PCI Bus I/O Busses JR. S 00 10

A Computer System with One Bus: Backplane Bus Processor Memory I/O Devices • A single bus (the backplane bus) is used for: – Processor to memory communication – Communication between I/O devices and memory • Advantages: Simple and low cost • Disadvantages: slow and the bus can become a major bottleneck • Example: IBM PC - AT JR. S 00 11

A Two-Bus System Processor Memory Bus Adaptor I/O Bus Adaptor I/O Bus • I/O buses tap into the processor-memory bus via bus adaptors: – Processor-memory bus: mainly for processor-memory traffic – I/O buses: provide expansion slots for I/O devices • Apple Macintosh-II – Nu. Bus: Processor, memory, and a few selected I/O devices – SCCI Bus: the rest of the I/O devices JR. S 00 12

A Three-Bus System Processor Memory Bus Adaptor Backplane Bus Adaptor I/O Bus • A small number of backplane buses tap into the processor-memory bus – Processor-memory bus is only used for processor-memory traffic – I/O buses are connected to the backplane bus • Advantage: loading on the processor bus is greatly reduced JR. S 00 13

North/South Bridge architectures: separate busses Processor Memory Bus Memory “backside cache” Bus Adaptor Backplane Bus Adaptor I/O Bus • Separate sets of pins for different functions – – Memory bus Caches Graphics bus (for fast frame buffer) I/O busses are connected to the backplane bus • Advantage: – Busses can run at different speeds – Much less overall loading! JR. S 00 14

What defines a bus? Transaction Protocol Timing and Signaling Specification Bunch of Wires Electrical Specification Physical / Mechanical Characteristics – the connectors JR. S 00 15

Synchronous and Asynchronous Bus • Synchronous Bus: – Includes a clock in the control lines – A fixed protocol for communication that is relative to the clock – Advantage: involves very little logic and can run very fast – Disadvantages: » Every device on the bus must run at the same clock rate » To avoid clock skew, they cannot be long if they are fast • Asynchronous Bus: – It is not clocked – It can accommodate a wide range of devices – It can be lengthened without worrying about clock skew – It requires a handshaking protocol JR. S 00 16

Busses so far Master Slave °°° Control Lines Address Lines Data Lines Bus Master: has ability to control the bus, initiates transaction Bus Slave: module activated by the transaction Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information. Asynchronous Bus Transfers: control lines (req, ack) serve to orchestrate sequencing. Synchronous Bus Transfers: sequence relative to common clock. JR. S 00 17

Bus Transaction • Arbitration: • Request: • Action: Who gets the bus What do we want to do What happens in response JR. S 00 18

Arbitration: Obtaining Access to the Bus Control: Master initiates requests Bus Master Data can go either way Bus Slave • One of the most important issues in bus design: – How is the bus reserved by a device that wishes to use it? • Chaos is avoided by a master-slave arrangement: – Only the bus master can control access to the bus: It initiates and controls all bus requests – A slave responds to read and write requests • The simplest system: – Processor is the only bus master – All bus requests must be controlled by the processor – Major drawback: the processor is involved in every transaction JR. S 00 19

Multiple Potential Bus Masters: the Need for Arbitration • Bus arbitration scheme: – A bus master wanting to use the bus asserts the bus request – A bus master cannot use the bus until its request is granted – A bus master must signal to the arbiter the end of the bus utilization • Bus arbitration schemes usually try to balance two factors: – Bus priority: the highest priority device should be serviced first – Fairness: Even the lowest priority device should never be completely locked out from the bus • Bus arbitration schemes can be divided into four broad classes: – Daisy chain arbitration – Centralized, parallel arbitration – Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus. – Distributed arbitration by collision detection: Each device just “goes for it”. Problems found after the fact. JR. S 00 20

The Daisy Chain Bus Arbitrations Scheme Device 1 Highest Priority Grant Bus Arbiter Device N Lowest Priority Device 2 Grant Release Request wired-OR • Advantage: simple • Disadvantages: – Cannot assure fairness: A low-priority device may be locked out indefinitely – The use of the daisy chain grant signal also limits the bus speed JR. S 00 21

Centralized Parallel Arbitration Device 1 Grant Device 2 Device N Req Bus Arbiter • Used in essentially all processor-memory busses and in high-speed I/O busses JR. S 00 22

Simplest bus paradigm • All agents operate synchronously • All can source / sink data at same rate • => simple protocol – just manage the source and target JR. S 00 23

Simple Synchronous Protocol BReq BG R/W Address Data Cmd+Addr Data 1 Data 2 • Even memory busses are more complex than this – memory (slave) may take time to respond – it may need to control data rate JR. S 00 24

Typical Synchronous Protocol BReq BG R/W Address Cmd+Addr Wait Data 1 Data 2 • Slave indicates when it is prepared for data xfer • Actual transfer goes at bus rate JR. S 00 25

Increasing the Bus Bandwidth • Separate versus multiplexed address and data lines: – Address and data can be transmitted in one bus cycle if separate address and data lines are available – Cost: (a) more bus lines, (b) increased complexity • Data bus width: – By increasing the width of the data bus, transfers of multiple words require fewer bus cycles – Example: SPARCstation 20’s memory bus is 128 bit wide – Cost: more bus lines • Block transfers: – – Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request JR. S 00 26

Increasing Transaction Rate on Multimaster Bus • Overlapped arbitration – perform arbitration for next transaction during current transaction • Bus parking – master holds onto bus and performs multiple transactions as long as no other master makes request • Overlapped address / data phases – requires one of the above techniques • Split-phase (or packet switched) bus – completely separate address and data phases – arbitrate separately for each – address phase yield a tag which is matched with data phase • ”All of the above” in most modern memory buses JR. S 00 27

1993 CPU- Memory Bus Survey Bus MBus Summit Challenge XDBus Originator Sun HP SGI Sun Clock Rate (MHz) 40 60 48 66 Address lines 36 48 40 muxed Data lines 64 128 256 144 (parity) Data Sizes (bits) 256 512 1024 512 Clocks/transfer 4 5 4? Peak (MB/s) 320(80) 960 1200 1056 Master Multi Arbitration Central Slots 16 9 10 Busses/system 1 1 1 2 Length 13 inches 12? inches 17 inches JR. S 00 28

Asynchronous Handshake (4 phase) Write Transaction Address Master Asserts Address Data Master Asserts Data Next Address Read Req Ack t 0 • • • t 1 t 2 t 3 t 4 t 5 t 0 : Master has obtained control and asserts address, direction, data Waits a specified amount of time for slaves to decode target t 1: Master asserts request line t 2: Slave asserts ack, indicating data received t 3: Master releases req t 4: Slave releases ack JR. S 00 29

Read Transaction Address Master Asserts Address Data Next Address Slave Data Read Req Ack t 0 t 1 t 2 t 3 t 4 t 5 • t 0 : Master has obtained control and asserts address, direction, data • Waits a specified amount of time for slaves to decode target • t 1: Master asserts request line • t 2: Slave asserts ack, indicating ready to transmit data • t 3: Master releases req, data received • t 4: Slave releases ack JR. S 00 30

1993 Backplane/IO Bus Survey Bus SBus Originator Turbo. Channel Micro. Channel PCI Sun Clock Rate (MHz) Addressing DEC IBM 16 -25 12. 5 -25 async 33 Virtual Physical Data Sizes (bits) 8, 16, 24, 32, 64 Intel Physical 8, 16, 32 8, 16, 24, 32 Master Multi Single Multi Arbitration Central Physical 8, 16, 24, 32, 64 Multi 32 bit read (MB/s) 33 25 20 33 Peak (MB/s) 89 84 75 111 (222) Max Power (W)16 26 13 25 JR. S 00 31

High Speed I/O Bus • Examples – graphics – fast networks • Limited number of devices • Data transfer bursts at full rate • DMA transfers important – small controller spools stream of bytes to or from memory • Either side may need to squelch transfer – buffers fill up JR. S 00 32

PCI Read/Write Transactions • All signals sampled on rising edge • Centralized Parallel Arbitration – overlapped with previous transaction • • All transfers are (unlimited) bursts Address phase starts by asserting FRAME# Next cycle “initiator” asserts cmd and address Data transfers happen on when – IRDY# asserted by master when ready to transfer data – TRDY# asserted by target when ready to transfer data – transfer when both asserted on rising edge • FRAME# deasserted when master intends to complete only one more data transfer JR. S 00 33

PCI Read Transaction – Turn-around cycle on any signal driven by more than one agent JR. S 00 34

PCI Write Transaction JR. S 00 35

The System-on-a-Chip Nightmare System Bus DMA CPU DSP Mem Ctrl. Bridge The “Board-on-a-Chip” Approach MPEG C I O O Custom Interfaces Control Wires Peripheral Bus JR. S 00 36

Sonics SOC Integration Architecture { DMA CPU DSP MPEG Multi. Chip Backplane™ Silicon. Backplane™ (patented) Open Core Protocol™ C MEM I O Silicon. Backplane Agent™ JR. S 00 37

Open Core Protocol Goals • • • Bus Independent Scalable Configurable Synthesis/Timing Analysis Friendly Encompass entire core/system interface needs (data, control, and test flows) JR. S 00 38

Data, Control, and Test Flows • Data Flow – Signals and protocols associated with moving data – Includes address, data, handshaking, etc. – Similar to services provided by traditional computer buses • Control Flow – Signals and protocols associated with non-data communication – Sideband - not synchronized to data flow (out of band) – Examples include interrupts, high-level flow control, etc. • Test Flow – Signals and protocols related to debug and manufacturing test JR. S 00 39

OCP Overview • Point-to-point, uni-directional, synchronous – easy physical implementation • Master/Slave, request/response – well-defined, simple roles • Extensions – added functionality to support cores with more complex interface requirements • Configurability – pay only for the features needed for a given core JR. S 00 40

Master vs. Slave IP Core Master Open Core Protocol Initiator Slave IP Core Master IP Core Slave Request Slave Master Target Response Master On-Chip Bus JR. S 00 41

Basic OCP MCmd [3] MAddr [N] MData [N] SCmd. Accept SResp [3] SData [N] MCmd, MAddr SCmd. Accept SResp, SData MCmd, Maddr, MData SCmd. Accept Slave Master Clk Read: Command, Address Command Accept Response, Data Write (posted): Command, Address, Data Command Accept JR. S 00 42

Protocol Phases • Request Phase (begins Transfer) – Master presents request (command, address, etc. ) to Slave • Response Phase (ends Transfer) – Slave presents response (success/fail, read data) to Master – Only available for read transfers (posted write model) • Datahandshake Phase (Optional) – Allows pipelining request ahead of write data – Only available for write transfers • Phase ordering – Request -> Datahandshake -> Response JR. S 00 43

OCP Extensions • Simple Extensions – – Byte Enables Bursts Flow Control Data Handshake • Complex Extensions – Threads and Connections • Sideband Signals JR. S 00 44

The Backplane: Why Not Use a Computer Bus? Transmit FIFO IP Core Receive FIFO Arbiter Computer Bus IP Core Data Address IP Core Time • Expensive to decouple • Not designed for real-time JR. S 00 45

Communication Buses Decouple and Guarantee Real Time Transmit FIFO IP Core Receive FIFO TDMA Communications Bus IP Core Data TDMA IP Core Time • Connections are expensive • Poor read latency JR. S 00 46

Silicon. Backplane™ Employs Best of Both From Computing • Address-based selection • Write and read transfers • Pipelining DMA CPU C MEM From Communications • Efficient BW decoupling • Guaranteed BW & latency • Side-band signaling DSP I MPEG O JR. S 00 47

Guaranteed Bandwidth Arbitration • Independent arbitration for every cycle includes two phases: - Distributed TDMA - Round robin Current Slot • Provides fine control over system bandwidth Arbitration Command JR. S 00 48

Guaranteed Latency • Fixed latency between command/address and data/response phases • Matches pipelined CPU model ensuring high performance access to on-chip resources • Pipelined data routed through Silicon. Backplane™ • Latency re-programmable in software • Variable-latency blocks do not tie up the Silicon. Backplane JR. S 00 49

Integrated Signaling Mechanism • Dedicated Silicon. Backplane™ wires (Flags) support: – Bus-style out-of-band signaling (interrupts) – Point-to-point communications (flow control) – Dynamic point-to-point (retry mechanism) • Same design flow, timing, flexibility as address/data portion of Sonics. IA™ JR. S 00 51

Multi. Chip Backplane™ Extends Sonics. IA™ Between Chips Silicon. Backplane CPU-Based ASSP Multi. Chip Backplane FPGA ASSP Seamless integration of protocols JR. S 00 52

Validation / Test • Silicon. Backplane™ highly visible for test – All subsystems communicate through Silicon. Backplane • Test Interfaces: Multi. Chip Backplane™ Test Vectors – Multi. Chip Backplane: 100’s MB/sec. – Service. Agent: Scan-based • Each subsystem can be tested/validated stand-alone JR. S 00 53

Summary • Busses are an important technique for building large-scale systems – Their speed is critically dependent on factors such as length, number of devices, etc. – Critically limited by capacitance – Tricks: esoteric drive technology such as GTL • Important terminology: – Master: The device that can initiate new transactions – Slaves: Devices that respond to the master • Two types of bus timing: – Synchronous: bus includes clock – Asynchronous: no clock, just REQ/ACK strobing • System-on-a-Chip approach invites new solutions – Well-defined and clear communication protocols – Physical layer hidden to designer JR. S 00 54