Intelligent Interconnects for Multicore So Cs Drew Wingard

  • Slides: 33
Download presentation
Intelligent Interconnects for Multicore So. C’s Drew Wingard, CTO, Sonics, Inc. • OCIN 06:

Intelligent Interconnects for Multicore So. C’s Drew Wingard, CTO, Sonics, Inc. • OCIN 06: December 6, 2006

Agenda • So. C Background • Interconnect Architecture • Application Areas Dec. 6, 2006

Agenda • So. C Background • Interconnect Architecture • Application Areas Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 2

The Goal Create a system-on-a-chip, comprising ten million gates, that satisfies rapidly-evolving market requirements

The Goal Create a system-on-a-chip, comprising ten million gates, that satisfies rapidly-evolving market requirements for: – – – Speed Power Area Application Performance Time to Market Using the minimum resources with maximum predictability Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 3

So. C Architecture Trends • Massive feature integration – Driven largely by Moore’s Law

So. C Architecture Trends • Massive feature integration – Driven largely by Moore’s Law (supply) and convergence (demand) • Continued movement of complexity to software • Distributed architectures – Higher scalability (and independence? ) • Multiple processors – CPU – DSP – Special purpose (MPEG, packet, …) • Distributed DMA – Removes centralized DMA bottleneck – Simplifies driver software integration Dec. 6, 2006 CPU MPEG DRAM Controller 3 D GFX DSP Video I/O MAC Comm I/O System On Chip OCIN 06: Intelligent Interconnects for Multicore So. C’s 4

Why So. C’s Are Difficult • Improvement by feature integration has been practiced for

Why So. C’s Are Difficult • Improvement by feature integration has been practiced for decades – Why is So. C any different? • Benefits of an So. C – – Higher performance Smaller footprint Lower power Lower cost • Size, power, and cost benefits derive from sharing – Control software on CPU (interrupts, etc. ) – Memory resources (on-chip and off-chip) – single external DRAM • Building predictable systems with so much sharing is hard – Dozens to hundreds of interrupt sources – Memory bandwidth bottlenecks + lots of real-time traffic stress Unified Memory Architecture (UMA) Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 5

Limits of Tightly Coupled Design • Tightly coupled design has been dominant – Assumes

Limits of Tightly Coupled Design • Tightly coupled design has been dominant – Assumes largely synchronous, instantaneous and free communication – Widely practiced, and supported in design flows • BUT – – Delivering clocks is problematic Wire delay is dominant Routing area can cost more than gates Too many constraints, from too many blocks • Cannot afford lowest common denominator design Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 6

Our Approach: Active Decoupling • • Separation Abstraction Optimization Independence 16 DMA CPU Master

Our Approach: Active Decoupling • • Separation Abstraction Optimization Independence 16 DMA CPU Master Slave 128 DRAM Controller Slave Core Function Communication Socket Bus Slave Agent Adapter Network Bus Slave Master Agent Adapter Bus Master Agent Adapter Internal Fabric SMART Interconnect Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 7

Layers of Decoupling Higher Abstraction Performance Identification Transfer Protocol Signaling Electrical Dec. 6, 2006

Layers of Decoupling Higher Abstraction Performance Identification Transfer Protocol Signaling Electrical Dec. 6, 2006 • • • Degree of blocking Qo. S Addressing Source identification Bursting, pipelining Threading Data, address, event widths Handshaking / flow control Signal timing and capacitance Clock frequency OCIN 06: Intelligent Interconnects for Multicore So. C’s 8

Evolution of Design Abstractions Need to be here! TIME Level of Abstraction Tiles Networks

Evolution of Design Abstractions Need to be here! TIME Level of Abstraction Tiles Networks Functional Interconnect Most designs are. Cores here Cores n Blocks Buses Gates Wires Abstraction minimizes the number of objects that the designer must manage Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 9

Abstraction Enables Higher Complexity Log(GATE COUNT) THE PAST PRESENT y g o l no

Abstraction Enables Higher Complexity Log(GATE COUNT) THE PAST PRESENT y g o l no ch e T 5 M+ Gates P h us 100 M+ Gates FUTURE Tiles Networks p Cores Blocks Interconnects Buses DEVELOPMENT TIME & COST Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 10

The OCP Socket • Owned/advanced by OCP-IP (www. ocpip. org) – Over 150 member

The OCP Socket • Owned/advanced by OCP-IP (www. ocpip. org) – Over 150 member companies • • Interconnect-neutral Defines data flow, control flow, test signaling Highly configurable to match core needs Simple Master/Slave request/response protocols – With many options • Fully synchronous, point-to-point • Optional pipelining, bursting, threading • Flexible handshaking and other flow control Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 11

Agenda • So. C Background • Interconnect Architecture – Advanced Fabrics – Intelligent Agents

Agenda • So. C Background • Interconnect Architecture – Advanced Fabrics – Intelligent Agents • Application Areas Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 12

So. C Data Flow Requirements • Connect dozens of initiators to several memories –

So. C Data Flow Requirements • Connect dozens of initiators to several memories – While meeting a wide range of latency and throughput requirements • Connect processors and DMA to control ports and peripherals – With predictably low latencies • Example So. C requirements (interconnect view): Metric Mode Type Requirement CPU – peripheral interrupt service All Worst-case < 10 cycles CPU – DDR cache miss latency All Average H. 264 decode – DDR bandwidth HD vid. Worst-case 1. 2 GBytes/sec H. 264 decode – service jitter HD vid. Worst-case < 6 macro-blocks Video refresh – DDR bandwidth SD/NI Worst-case 56 MBytes/sec … … … DDR data bus bandwidth HD vid. Sustained Dec. 6, 2006 As low as feasible … > 80% OCIN 06: Intelligent Interconnects for Multicore So. C’s 13

Interconnect Fabric Options • So. C data flow requirements must be satisfied by internal

Interconnect Fabric Options • So. C data flow requirements must be satisfied by internal interconnect fabric – Big challenge in current So. C designs! • Choices in interconnect fabric design – – Unified vs. split transactions Shared vs. separate physical links Combinational vs. pipelined Single vs. multiple outstanding transactions (transaction pipelining) – In-order vs. out-of-order completion and response – Blocking vs. non-blocking flow control Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 14

Blocking vs. Non-blocking Flow Control • Sharing in So. C’s creates many opportunities for

Blocking vs. Non-blocking Flow Control • Sharing in So. C’s creates many opportunities for contention – Arbitration determines who wins – Flow control determines when the winner gets to go • Blocking flow control systems allow resource shortages along some paths to prevent other paths from progressing • Non-blocking flow control systems ensure that points of sharing never stall if any data flow could progress • Locally non-blocking flow control (aka virtual channels) improves efficiency and allows more resource sharing • End-to-end non-blocking flow control offers greater predictability – Provides basis for Qo. S guarantees Dec. 6, 2006 Our Approach OCIN 06: Intelligent Interconnects for Multicore So. C’s 15

Sonics. MX Basic Architecture • Hybrid topologies – Full / partial cross-bar – Shared

Sonics. MX Basic Architecture • Hybrid topologies – Full / partial cross-bar – Shared bus • Pipelined, multi-threaded, non-blocking fabric CPU ROM DSP SRAM SMX GFX Flash Ctl. DRAM Ctl. • Fully split (dual) request / response • Distributed Qo. S arbiter T SMX – Spans cycle, frequency, and data width boundaries – Supports flexible thread merging tree topologies Dec. 6, 2006 I I I T I I OCIN 06: Intelligent Interconnects for Multicore So. C’s 16

Global Interconnect Responsibilities • Routing – Getting requests, responses and data to the desired

Global Interconnect Responsibilities • Routing – Getting requests, responses and data to the desired destination • Access control – Managing contention for shared resources (ensuring Qo. S) – Ensuring requested access is allowed (security and protection) • Error management – Detection, reporting, and SW recovery support • Power management – Activity detection, clock and voltage removal support • Connectivity – Protocol conversion – Data width / clock frequency conversion • Spanning distance – Connecting endpoints at required frequency and latency Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 17

The Intelligence is in the Agents • Agents provide… INITIATOR SOCKETS – Protocol conversion

The Intelligence is in the Agents • Agents provide… INITIATOR SOCKETS – Protocol conversion • Agent adapts to IP core I – Decoupling of IP cores from fabric I I Initiator Agents (IA) • Provide local, isolated environment – Layered services • Proven technology – Over 100 million IC’s shipped so far Fabric • Agent services – – – Power management Security management Error management Qo. S Burst, width, and command conversion Dec. 6, 2006 Target Agents (TA) T T TARGET SOCKETS OCIN 06: Intelligent Interconnects for Multicore So. C’s 18

Sonics Interconnect Products Product Silicon. Backplane Sonics 3220 Sonics. MX Introduction 1999 2002 2004

Sonics Interconnect Products Product Silicon. Backplane Sonics 3220 Sonics. MX Introduction 1999 2002 2004 Connects Processors, memories Peripherals, registers Processors, memories Socket(s) OCP 1/2, APB OCP 1/2, AHB, AXI Decoupling High Medium Very High Fabric Distributed, shared bus Multi-branch shared bus Hybrid cross-bar / shared bus Applications WLAN, HDTV, PVR, … Handsets, consumer, … Handsets, HDTV, DSC, OA, … Volume Production Over 100 million IC’s shipped 2005 Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 19

Interconnect Performance • Many standard peak measurements pretty meaningless when considering So. C interconnects

Interconnect Performance • Many standard peak measurements pretty meaningless when considering So. C interconnects – Bandwidth is cheap (wires are cheaper than pins) – Usable bandwidth mostly dependent on attached cores… – … and application scenario • Sonics’ products engineered to span very wide range of operating points – – – Latencies: 0 -10 clock cycles Bandwidth: limited only by targets Frequency: keep up with DRAM IP core count: 10 -100+ No process-specific limitations Standard ASIC-style design flow Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 20

Product Deliverables • Configuration tools – Aid in capture, checking, generation, and verification of

Product Deliverables • Configuration tools – Aid in capture, checking, generation, and verification of Sonics’ IP • Configured Register Transfer Language (RTL) code – “Virtual hardware” input to logic synthesis and place & route tools • Configured verification environment – Tests customer configuration of Sonics’ IP • Logic synthesis scripts & floorplan interface – Bridges to physical design domain • Electronic System Level (ESL) models (in System. C) – Helps architectural exploration and firmware development • Analysis tools and models – Aids customer in building and analyzing prototypes of So. C Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 21

Design Issues Addressed By Our Approach Perf. Verification Virtual Prototyping Parallel IP Creation Arch.

Design Issues Addressed By Our Approach Perf. Verification Virtual Prototyping Parallel IP Creation Arch. Modeling SW Development Timing Closure Complex Memory Hierarchies Signal Integrity Design Re-use Methodology & Automation Variable Clock Freq. Voltage Isolation Power Management Scalable Fabrics Intelligen t Agents Error Management Access Security High Peripheral Count Data Width Conversion Distributed Processing Guaranteed BW Qo. S Dec. 6, 2006 Mixed Endianness Pipelining Protocol Conversion OCIN 06: Intelligent Interconnects for Multicore So. C’s 22

Design Timeline Benefits of Our Approach Tightly Coupled IP Core SOC Start Emulator? First

Design Timeline Benefits of Our Approach Tightly Coupled IP Core SOC Start Emulator? First Integration Re-verified “Hello World” Core Interaction Tapeout! Time IP Core Sonics SOC Core Interaction Integration “Hello World” Tapeout! Start Re-verified Design Functional Verif. Synth. /Timing Verif. Performance Verif. Time Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 23

Agenda • So. C Background • Interconnect Architecture • Application Areas Dec. 6, 2006

Agenda • So. C Background • Interconnect Architecture • Application Areas Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 24

Inside a Tile-based Multicore So. C • Example: TI OMAP 2420 • Heterogeneous processors

Inside a Tile-based Multicore So. C • Example: TI OMAP 2420 • Heterogeneous processors – – General-purpose CPU (e. g. ARM 11) DSP (e. g. TI C 5 x) 2 D/3 D graphics (e. g. Imagination Power. VR MBX) Video accelerator (e. g. MPEG 4 codec) • Multiple levels of shared memory – External DRAM and Flash – Internal SRAM and ROM • Very complex peripheral subsystems • Security management (content, service, stability) • Aggressive power management Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 25

(from OMAP 5910) IVA Sources: www. ti. com, Dec. www. arm. com, 6, 2006

(from OMAP 5910) IVA Sources: www. ti. com, Dec. www. arm. com, 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s www. powervr. com 26

Multicore Architecture Advantages What is needed Avner Goren TI EPF 2004 Dec. 6, 2006

Multicore Architecture Advantages What is needed Avner Goren TI EPF 2004 Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 27

Application Processor Interconnect Example P T P P P T T T T P

Application Processor Interconnect Example P T P P P T T T T P P T T T CPU Tile 2 D/3 D Graphics MPEG 4 Codec Partial XBar Tile Fabric Simple Socket T T P P Shared Bus Fabric P SM SM I I I T P T P I SMX T I I Camera Interface SMX I LCD Controller Flash Controller T Fabric I/F I T T Decoupling Buffer T T P T Socket I/F T 16 P Agent Regs P S 3220 P USB 2. 0 P P MP 3 P P T T I I I T MMU Embedded SRAM Complex Socket 128 SDRAM Controller DMA X RAM Inst. Cache DMA P T T P P T T Data Y T DSP T T Cache RAM Core Intelligent Interconnects for Multicore So. C’s Dec. 6, 2006 OCIN 06: P P DSP Tile T P T 28 P

So. C Application Requirements (1/2) DRAM Efficiency Realtime Application Processor ü ü ü DTV/STB

So. C Application Requirements (1/2) DRAM Efficiency Realtime Application Processor ü ü ü DTV/STB ü ü Gaming ü ü Multi-function Printers ü ü Market Feature- Powerdriven optimized ü WLAN Dec. 6, 2006 ü ü Flexible Platform ü ü OCIN 06: Intelligent Interconnects for Multicore So. C’s 29

So. C Application Requirements (2/2) Market DRAM Efficiency Cellular Baseband Realtime ü Feature- Powerdriven

So. C Application Requirements (2/2) Market DRAM Efficiency Cellular Baseband Realtime ü Feature- Powerdriven optimized Flexible Platform ü ü ü DSC ü ü PC ü ü ü ü ü Automotive Storage ü Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 30

Challenges With Textbook No. C • So. C applications do not offer lots of

Challenges With Textbook No. C • So. C applications do not offer lots of networklevel concurrency – Many are dominated by a single DRAM target • No. C packetization and serialization overhead too high – Communication (wires) vs. computation (router, NI) cost trade-offs different for So. C • No. C latency is unacceptable for current processors – Should improve as reality sets in… Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 31

Research Opportunities • Sonics is interested in facilitating research in the following areas: –

Research Opportunities • Sonics is interested in facilitating research in the following areas: – – – No. C benchmarking (via OCP-IP) Distributed, heterogeneous cache and I/O coherence Performance constraint capture Static performance analysis (SPA) Implications of 3 D packaging on So. C architecture • Interested? – Please contact me: wingard@sonicsinc. com Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 32

Summary • Active decoupling is essential for managing heterogeneous designs • Non-blocking interconnect fabrics

Summary • Active decoupling is essential for managing heterogeneous designs • Non-blocking interconnect fabrics offer higher efficiency and better Qo. S • Intelligent agents centralize key services • High volume multicore So. C applications need advanced on-chip interconnects now • Please contact me for references Dec. 6, 2006 OCIN 06: Intelligent Interconnects for Multicore So. C’s 33