OnChip Communication Architectures Synthesis Techniques ICS 295 Sudeep

  • Slides: 71
Download presentation
On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based

On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 © 2008 Sudeep Pasricha & Nikil Dutt 1

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware Synthesis Co-synthesis with Memory © 2008 Sudeep Pasricha & Nikil Dutt 2

Introduction Designing on-chip communication architectures is becoming more and more challenging ◦ increasing number

Introduction Designing on-chip communication architectures is becoming more and more challenging ◦ increasing number of components in today's systems translates into more inter-component communication Multi-dimensional design constraints ◦ ↑ performance, reliability ◦ ↓ power, cost, area, time-to-market System designers need techniques that can ◦ optimize for individual design goals ◦ allow design decisions to provide a good balance 3 © 2008 Sudeep Pasricha & Nikil Dutt

Introduction Exploration and synthesis techniques can broadly be classified into 3 categories: ◦ Static,

Introduction Exploration and synthesis techniques can broadly be classified into 3 categories: ◦ Static, dynamic, hybrid Commercial toolkits available for standard bus architectures, ◦ AMBA Designer/Design Kit ◦ STBus Gen. Kit ◦ Sonics Studio Not very useful for automating exploration and synthesizing communication © 2008 Sudeep Pasricha & Nikil Dutt 4

Introduction Bus Architecture Synthesis: ◦ process of designing a bus architecture topology and/or its

Introduction Bus Architecture Synthesis: ◦ process of designing a bus architecture topology and/or its protocol parameters to satisfy application constraints Bus Architecture Synthesis Constraints -Performance -Power Arbitration strategy -Cost -Area -reliability Parameter Space Topology Space Data bus widths Bus clock frequencies Buffer sizes © 2008 Sudeep Pasricha & Nikil Dutt 5

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware Synthesis Co-synthesis with Memory © 2008 Sudeep Pasricha & Nikil Dutt 6

Topology Synthesis Topology of a bus-based on-chip communication architecture determines ◦ number of buses

Topology Synthesis Topology of a bus-based on-chip communication architecture determines ◦ number of buses in the system ◦ manner in which they are interconnected to each other ◦ how components are allocated to the buses Early work focused on allocating inter-component comm. to buses for distributed real-time embedded systems ◦ Yen et al. [ICCAD ‘ 95] proposed techniques to estimate comm. delay on a bus using static analysis for a system with periodic tasks assigned a PE to existing bus, or created a new bus to meet task deadlines ◦ Ortega et al. [ICCAD ‘ 98] explored mapping of PEs in to a set of off-chip bus architecture configurations buses © 2008 (shared Sudeep Pasricha & Nikilor Duttpoint 7

Topology Synthesis Liveris et al. [DATE ‘ 04] proposed a bus topology synthesis technique

Topology Synthesis Liveris et al. [DATE ‘ 04] proposed a bus topology synthesis technique to reduce bus power consumption while meeting latency constraints ◦ ◦ AMBA AHB bus architecture Simple FIFO arbitration Dynamic power reduction Switching activity α is taken as 0. 5 for data bus, and a lower value for address bus control wire switching is ignored ◦ Each master has a latency constraint that determines number of cycles available to complete a communication operation © 2008 Sudeep Pasricha & Nikil Dutt 8

Topology Synthesis To improve latency response of communication architecture and also reduce power consumption

Topology Synthesis To improve latency response of communication architecture and also reduce power consumption on the bus wires, Liveris et al. proposed using 3 different topology transformations © 2008 Sudeep Pasricha & Nikil Dutt 9

Topology Synthesis Private slave creation ◦ making a slave private to a master is

Topology Synthesis Private slave creation ◦ making a slave private to a master is possible if the master is the only one accessing the slave ◦ removes a slave from the shared bus, which reduces the fanout by one for all the signals driven by the AMBA logic © 2008 Sudeep Pasricha & Nikil Dutt 10

Topology Synthesis Slave isolation ◦ Moving a slave to another layer © 2008 Sudeep

Topology Synthesis Slave isolation ◦ Moving a slave to another layer © 2008 Sudeep Pasricha & Nikil Dutt 11

Topology Synthesis Grouping masters ◦ Moving masters to another layer to reduce arbitration conflict

Topology Synthesis Grouping masters ◦ Moving masters to another layer to reduce arbitration conflict © 2008 Sudeep Pasricha & Nikil Dutt 12

Topology Synthesis heuristic ◦ initially, all masters and slaves are mapped to a single

Topology Synthesis heuristic ◦ initially, all masters and slaves are mapped to a single layer ◦ private slave creation transformation is applied for all eligible slaves ◦ in case a latency violation exists for a master, slave isolation transformation is applied to the slowest slave ◦ if violation persists, grouping masters transformation is performed by transferring masters with less stringent latency requirements to a new layer ◦ once a solution that satisfies latency constraints is obtained, slave isolation and grouping masters transformations are performed to reduce power ◦ at every iteration power of current solution is calculated, 2008 Sudeep Pasricha &activity Nikil Dutton the 13 by using probability-based formulations to©estimate switching

Topology Synthesis Heuristic was implemented in C and applied to ◦ Sobel Transform So.

Topology Synthesis Heuristic was implemented in C and applied to ◦ Sobel Transform So. C 29. 6% less power © 2008 Sudeep Pasricha & Nikil Dutt 14

Topology Synthesis Murali et al. [DATE ‘ 05] proposed a methodology for STBus crossbar

Topology Synthesis Murali et al. [DATE ‘ 05] proposed a methodology for STBus crossbar (matrix) synthesis Compared to a full crossbar, a partial crossbar has ◦ fewer communication components (buses, arbiters, decoders, etc. ), lower area, reduced power consumption Goal: ◦ design a minimal cost partial crossbar bus architecture for a given MPSo. C application ◦ average and maximum packet latencies must lie within acceptable bounds from the latencies obtained for a full crossbar © 2008 Sudeep Pasricha & Nikil Dutt 15

Topology Synthesis Phase 1: System. C simulation ◦ window-based traffic analysis -> window size

Topology Synthesis Phase 1: System. C simulation ◦ window-based traffic analysis -> window size is parametrizable Phase 2: Preprocessing to identify ◦ overlapping critical traffic streams to be mapped to separate buses ◦ targets with large traffic overlap in a window to map to separate buses ◦ max. no. of targets to be connected to a bus (to bound max. © 2008 Sudeep Pasricha & Nikil Dutt 16 latency)

Topology Synthesis Applied methodology to synthetic MPSo. C applications © 2008 Sudeep Pasricha &

Topology Synthesis Applied methodology to synthetic MPSo. C applications © 2008 Sudeep Pasricha & Nikil Dutt 17

Topology Synthesis Thepayasuwan et al. [DATE ‘ 04] proposed a simulated annealing (SA)-based approach

Topology Synthesis Thepayasuwan et al. [DATE ‘ 04] proposed a simulated annealing (SA)-based approach to synthesize a hierarchical shared bus architecture topology ◦ cost function accounts for criteria such as number of buses, communication conflict, and bus utilization ◦ SA based optimization depends on weights in cost function Yoo et al. [ASPDAC ‘ 07] presented an SA-based approach for synthesizing a cascaded crossbar Topology synthesis for segmented bus was presented by Guo et al. [ASPDAC ‘ 06] to ◦ obtain a solution with minimum wire energy © 2008 Sudeep Pasricha & Nikil Dutt ◦ generate a set of solutions to trade-off chip area, energy, 18

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware Synthesis Co-synthesis with Memory © 2008 Sudeep Pasricha & Nikil Dutt 19

Protocol Parameter Synthesis Bus-based communication architectures are characterized by several protocol parameters ◦ bus

Protocol Parameter Synthesis Bus-based communication architectures are characterized by several protocol parameters ◦ bus widths, bus clock frequencies, transaction burst sizes, arbitration schemes, buffer sizes Protocol parameter synthesis determines values for one or more parameter for a fixed topology ◦ while satisfying constraints of the application Early work in protocol parameter synthesis focused on determining bus width ◦ Narayan et al. [DATE ‘ 94] for simple shared bus architecture trade-off bus width with system performance no arbitration assumed; traffic conflict on shared bus ignored © 2008 Sudeep Pasricha & Nikil Dutt 20

Protocol Parameter Synthesis Lahiri et al. [ICCAD ’ 00] proposed an approach to determine

Protocol Parameter Synthesis Lahiri et al. [ICCAD ’ 00] proposed an approach to determine bus protocol parameters as well as component mapping on buses to improve performance © 2008 Sudeep Pasricha & Nikil Dutt 21

Protocol Parameter Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 22

Protocol Parameter Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 22

Protocol Parameter Synthesis Step 1: Co-simulate entire system ◦ assuming completely parallel (conflict-free) comm.

Protocol Parameter Synthesis Step 1: Co-simulate entire system ◦ assuming completely parallel (conflict-free) comm. between cores ◦ generate execution traces Step 2: save traces as a comm. analysis graph (CAG) Step 3: Performance analysis to generate comm. graph (CG) ◦ Represents statistics gathered by performance analysis ◦ Single weight derived for each edge © 2008 Sudeep Pasricha & Nikil Dutt 23

Protocol Parameter Synthesis Step 4: Generate initial component mapping to buses ◦ analyze CG

Protocol Parameter Synthesis Step 4: Generate initial component mapping to buses ◦ analyze CG ◦ calculate demand from component on comm. architecture demand of component = sum of weights of outgoing edges ◦ arrange components in a descending order of demand ◦ rank buses in comm. architecture by analyzing topology template higher rank is given to buses that have higher performance and are well connected to the rest of the buses ◦ Select highest ranked component and map to bus with maximum interaction level; repeat till no more components left Step 5: Generate initial protocol parameters ◦ High arbitration priority for higher ranked component ◦ Maximum block transfer size calculated as weighted average of the size of transactions between components on the bus © 2008 Sudeep Pasricha & Nikil Dutt 24

Protocol Parameter Synthesis Step 7: Generate transformations/moves to improve performance ◦ Create communication conflict

Protocol Parameter Synthesis Step 7: Generate transformations/moves to improve performance ◦ Create communication conflict graph (CCG) where edges between components represent communication overlap ◦ Changed congestion levels used to recalculate time taken for transactions ◦ Move with maximum time reduction (potential gain) is selected ◦ Repeat till no more improvement possible © 2008 Sudeep Pasricha & Nikil Dutt 25

Protocol Parameter Synthesis Experimental results ◦ ATM: cell forwarding unit of an output queued

Protocol Parameter Synthesis Experimental results ◦ ATM: cell forwarding unit of an output queued ATM switch, with a fixed topology having three buses connected by two bridges ◦ SYS: simple communication system with two buses connected by a single bridge © 2008 Sudeep Pasricha & Nikil Dutt 26

Protocol Parameter Synthesis Shin et al. [DATE ‘ 04] proposed a methodology to automatically

Protocol Parameter Synthesis Shin et al. [DATE ‘ 04] proposed a methodology to automatically determine slot schedule for a time division multiple access (TDMA)-based arbitration scheme © 2008 Sudeep Pasricha & Nikil Dutt 27

Protocol Parameter Synthesis Objective function ◦ To meet throughput requirements for masters © 2008

Protocol Parameter Synthesis Objective function ◦ To meet throughput requirements for masters © 2008 Sudeep Pasricha & Nikil Dutt 28

Protocol Parameter Synthesis Objective function ◦ To meet throughput and latency requirements for masters

Protocol Parameter Synthesis Objective function ◦ To meet throughput and latency requirements for masters © 2008 Sudeep Pasricha & Nikil Dutt 29

Protocol Parameter Synthesis Experimental results ◦ Best results with following GA parameters: crossover rate

Protocol Parameter Synthesis Experimental results ◦ Best results with following GA parameters: crossover rate of 70%, mutation rate of 25%, population size of 80% © 2008 Sudeep Pasricha & Nikil Dutt 30

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware Synthesis Co-synthesis with Memory © 2008 Sudeep Pasricha & Nikil Dutt 31

Topology and Protocol Parameter Synthesis Unlike previous approaches, a few approaches consider both topology

Topology and Protocol Parameter Synthesis Unlike previous approaches, a few approaches consider both topology and protocol parameter synthesis simultaneously ◦ more comprehensive synthesis Pandey et al. [FPLA ‘ 05] proposed a technique to simultaneously synthesize hierarchical shared bus topology and width of data buses ◦ while satisfying the performance constraints ◦ using integer linear programming (ILP) formulation Pasricha et al. [ASPDAC ‘ 05] proposed a technique to automate synthesis of hierarchical bus topology and multiple protocol parameters ◦ data bus widths, bus clock speeds, OO buffer sizes, DMA burst sizes © 2008 Sudeep Pasricha & Nikil Dutt 32

Topology and Protocol Parameter Synthesis Pasricha et al. [ASPDAC ‘ 06] proposed automated topology

Topology and Protocol Parameter Synthesis Pasricha et al. [ASPDAC ‘ 06] proposed automated topology and parameter synthesis methodology for bus matrix architectures Goal: minimal cost partial bus matrix tailored to application ◦ Has fewer busses (consequently fewer arbiters, decoders, buffers) ◦ Maximizes bus utilization ◦ Reduces implementation cost, area and power dissipation © 2008 Sudeep Pasricha & Nikil Dutt 33

Topology and Protocol Parameter Synthesis MPSo. C designs have performance constraints that can be

Topology and Protocol Parameter Synthesis MPSo. C designs have performance constraints that can be represented in terms of Data Throughput Constraints Communication Throughput Graph, CTG = G(V, A) incorporates So. C components and throughput constraints Throughput Constraint Path (TCP) is a CTG subgraph © 2008 Sudeep Pasricha & Nikil Dutt 34

Topology and Protocol Parameter Synthesis Communication Parameter Constraint Set (Ψ) ◦ Used to ensure

Topology and Protocol Parameter Synthesis Communication Parameter Constraint Set (Ψ) ◦ Used to ensure that approach generates realistic communication architecture ◦ constraints are in the form of a discrete set of valid values for protocol parameters to be synthesized ◦ e. g. , specifying that bus clock frequency for a bus can only be multiples of 33 MHz, up to a maximum of 330 MHz Allows designer to bias synthesis process based on knowledge of design and technology being targeted © 2008 Sudeep Pasricha & Nikil Dutt 35

Topology and Protocol Parameter Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 36

Topology and Protocol Parameter Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 36

Topology and Protocol Parameter Synthesis B&B Goal: cluster slave modules to minimize matrix cost

Topology and Protocol Parameter Synthesis B&B Goal: cluster slave modules to minimize matrix cost Start by clustering two slave clusters at a time ◦ Initially, each slave cluster has only one slave However, the total number of clustering configurations possible for n slaves is n. C 2 + (n. C 2. n-1 C 2) + (n. C 2. n-1 C 2. n-2 C 2) + … + (n! x (n 1)!)/2(n-1) ◦ Extremely large number for even medium sized So. Cs! To quickly prune out invalid clustering configurations and converge on an optimal solution, use a powerful bounding function Bounding function ◦ Called after every clustering operation ◦ Uses lookup table to discard duplicate clustering ops ◦ Discards all non-beneficial clustering ops (i. e. no savings in no. of busses) ◦ Discards incompatible clustering ops © 2008 Sudeep Pasricha & Nikil Dutt 37

Topology and Protocol Parameter Synthesis Experimental results on four MPSo. C applications from the

Topology and Protocol Parameter Synthesis Experimental results on four MPSo. C applications from the networking domain Significant matrix component savings ◦ 4. 6 x to 9 x when compared with a full bus matrix © 2008 Sudeep Pasricha & Nikil Dutt 38

Topology and Protocol Parameter Synthesis Methodology extended by Pasricha et al. [CODES+ISSS ‘ 06]

Topology and Protocol Parameter Synthesis Methodology extended by Pasricha et al. [CODES+ISSS ‘ 06] to synthesize bus matrix topology and protocol parameters ◦ with the incorporation of energy estimation models for bus wires and bus logic components Goal: generate multiple candidate bus matrix solutions, on which to perform a power-performance trade-off analysis Methodology applied to an MPSo. C application © 2008 Sudeep Pasricha & Nikil Dutt 39

Topology and Protocol Parameter Synthesis Results Up to 20% in power and 40% in

Topology and Protocol Parameter Synthesis Results Up to 20% in power and 40% in performance possible trade-off CTG Up to 8% in runtime and 15% in energy possible trade-off © 2008 Sudeep Pasricha & Nikil Dutt 40

Topology and Protocol Parameter Synthesis Pasricha et al. [VLSID ‘ 08] further extended this

Topology and Protocol Parameter Synthesis Pasricha et al. [VLSID ‘ 08] further extended this synthesis methodology by incorporating a PVT (process, voltage, temperature) variation aware power estimation technique Incorporating PVT variation-awareness in the system level bus matrix synthesis technique resulted in a set of curves for power and energy in the trade-off graph outputs ◦ instead of a single curve for power and energy Allowed for a more accurate power characterization in the face of PVT variations early in the design flow ◦ enabling designers to make more informed decisions when selecting a bus matrix configuration © 2008 Sudeep Pasricha & Nikil Dutt 41

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware Synthesis Co-synthesis with Memory © 2008 Sudeep Pasricha & Nikil Dutt 42

Physically-aware Synthesis Most synthesis approaches design the communication architecture without considering physical implementation issues

Physically-aware Synthesis Most synthesis approaches design the communication architecture without considering physical implementation issues that can influence performance ◦ such as the layout of the components on the chip or the lengths and routing of the bus wires interconnecting components Physical level information can be extremely important to guarantee that the synthesis results are reliable However, such physical level information is typically available much later in the design flow ◦ challenging to abstract up this information to early in the design flow during communication architecture design A few approaches have looked ©at 2008 this problem of Sudeep Pasricha & Nikil Dutt 43

Physically-aware Synthesis Dick et al. [DATE ‘ 99] proposed physically aware topology synthesis technique

Physically-aware Synthesis Dick et al. [DATE ‘ 99] proposed physically aware topology synthesis technique to ensure hard real-time communication deadlines between components were satisfied ◦ used a high level floorplanner to create a block placement, and estimate global wiring delays ◦ genetic algorithm (GA) was used to iterate over different bus topology configurations having low contention task assignments on components Drinic et al. [ICCAD ‘ 00] and Meguerdichian et al. [DAC ‘ 01] used a high level floorplanner to determine design feasibility during bus topology synthesis ◦ compared estimates of wire length with upper bound on wire length ◦ does not account for varying capacitive loads of components on © 2008 Sudeep Pasricha & Nikil Dutt 44 a bus

Physically-aware Synthesis Thepayasuwan et al. [ICCD ‘ 03] proposed a topology synthesis framework that

Physically-aware Synthesis Thepayasuwan et al. [ICCD ‘ 03] proposed a topology synthesis framework that used a high level floorplanner to obtain wire lengths ◦ lengths are incorporated into an SA cost function that is used to synthesize bus topology ◦ SA minimizes the cost function, and selects a topology solution with low total wire length Guo et al. [ASPDAC ‘ 06] used a high level floorplanner during segmented bus topology synthesis ◦ floorplanner aims to reduce length of critical wires with high switching activity to reduce wire energy consumption Pasricha et al. [CODES+ISSS ‘ 06] used a high level floorplanner to obtain wire length for estimating wire energy ◦ during bus matrix topology and parameter synthesis © 2008 Sudeep Pasricha & Nikil Dutt 45

Physically-aware Synthesis Pasricha et al. [DAC ‘ 05] proposed physically aware hierarchical bus topology

Physically-aware Synthesis Pasricha et al. [DAC ‘ 05] proposed physically aware hierarchical bus topology and protocol parameter synthesis technique (FABSYN) ◦ detects and eliminates clock cycle timing violations MEM 2 MEM 3 DTCM MEM 4 MEM 1 IP 1 ITCM ARM IP 2 So. C floorplan DMAC ASIC 1 ASIC 2 To meet performance constraints, bus clk speed set to 333 MHz (3 ns cycle time) After layout, signal delay 3. 5 ns, which violates 3 ns clock timing constraint! ◦ adverse effect on cost, complexity, constraint satisfiability To eliminate such violations, designers use©repeaters, pipeline elements 2008 Sudeep Pasricha & Nikil Dutt 46

Physically-aware Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 47

Physically-aware Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 47

Physically-aware Synthesis Simple bus mapping Bus mapping © 2008 Sudeep Pasricha & Nikil Dutt

Physically-aware Synthesis Simple bus mapping Bus mapping © 2008 Sudeep Pasricha & Nikil Dutt 48

Physically-aware Synthesis Mutate topology Create new bus and/or migrate IPs © 2008 Sudeep Pasricha

Physically-aware Synthesis Mutate topology Create new bus and/or migrate IPs © 2008 Sudeep Pasricha & Nikil Dutt 49

Physically-aware Synthesis Mutate topology Create new bus and/or migrate IPs © 2008 Sudeep Pasricha

Physically-aware Synthesis Mutate topology Create new bus and/or migrate IPs © 2008 Sudeep Pasricha & Nikil Dutt 50

Physically-aware Synthesis If a timing violation is detected ◦ TCPs that have components on

Physically-aware Synthesis If a timing violation is detected ◦ TCPs that have components on buses with violations flagged ◦ feedback loop is used to go back and attempt to eliminate violations ◦ first the TCP that has components on the violated bus with the largest load capacitance on its pins is selected from the flagged TCPs since cumulative capacitive load of components directly contributes to increasing signal propagation delay ◦ the components are iteratively migrated to another existing bus or a new bus if migration to existing buses causes TCP constraint violations ◦ If there is still a violation, another flagged TCP is selected and its components migrated away from the violated bus ◦ Another way used to eliminate clock cycle violations is to reduce bus clock frequency © 2008 Sudeep Pasricha & Nikil Dutt 51

Physically-aware Synthesis Synthesized hierarchical bus architecture Parameter Values main 1 main 2 main 3

Physically-aware Synthesis Synthesized hierarchical bus architecture Parameter Values main 1 main 2 main 3 periph bus width 32 32 bus speed 133 133 66 arb priority CPU 1 > M 3 > M 2 (static) © 2008 Sudeep Pasricha & Nikil Dutt 52

Physically-aware Synthesis Experimental study Constraint Set CTG © 2008 Sudeep Pasricha & Nikil Dutt

Physically-aware Synthesis Experimental study Constraint Set CTG © 2008 Sudeep Pasricha & Nikil Dutt 53

Physically-aware Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 54

Physically-aware Synthesis © 2008 Sudeep Pasricha & Nikil Dutt 54

Physically-aware Synthesis Quality of the FABSYN synthesis solution was compared with other synthesis approaches

Physically-aware Synthesis Quality of the FABSYN synthesis solution was compared with other synthesis approaches ◦ Initial: solution with just 2 buses (initial mapping) ◦ ABS: synthesis approach without integrated floorplanners ◦ Manual: designer driven manual synthesis approach with floorplanner © 2008 Sudeep Pasricha & Nikil Dutt 55

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware

Outline Introduction Topology Protocol Synthesis Parameter Synthesis Topology and Protocol Parameter Synthesis Physically aware Synthesis Co-synthesis with Memory © 2008 Sudeep Pasricha & Nikil Dutt 56

Co-synthesis with Memory can take up a large chunk of on-chip area, as much

Co-synthesis with Memory can take up a large chunk of on-chip area, as much as 70% in some cases ◦ Estimates indicate that this will go up to 90% in coming years Variety of different memory types available to satisfy storage requirements in MPSo. C applications ◦ DRAMs, SRAMs, EPROMs, EEPROMs etc. Typically ◦ DRAMs -> larger memory requirements, slower, cheaper ◦ SRAMs -> smaller memory requirements, faster, expensive ◦ EPROMs and EEPROMs -> read-only data Several tradeoffs during memory ◦ SRAM vs. DRAM cost vs. performance vs. area ◦ ports vs. number of memory blocks architecture synthesis © 2008 Sudeep Pasricha & Nikil Dutt 57

Co-synthesis with Memory architecture synthesis determines the ◦ number, type, size of the memories

Co-synthesis with Memory architecture synthesis determines the ◦ number, type, size of the memories in the system ◦ application data mapping to memories Memory architecture significantly contributes to data traffic on communication architectures Design of memory architecture has a substantial influence on communication architecture design Traditionally, in platform-based design, memory synthesis is performed before communication architecture synthesis ◦ can lead to inferior design decisions © 2008 Sudeep Pasricha & Nikil Dutt 58

Co-synthesis with Memory Motivational study (Pasricha et al. [DATE ‘ 06]) MPSo. C memory

Co-synthesis with Memory Motivational study (Pasricha et al. [DATE ‘ 06]) MPSo. C memory and comm. architecture synthesis Separate synthesis Co-synthesis © 2008 Sudeep Pasricha & Nikil Dutt 59

Co-synthesis with Memory Shalan et al. [SASIMI ‘ 03] proposed a tool to automatically

Co-synthesis with Memory Shalan et al. [SASIMI ‘ 03] proposed a tool to automatically generate a full crossbar and a dynamic memory management unit Grun et al. [DATE ‘ 02] considered the system connectivity topology early in the design flow, in conjunction with memory exploration, for simple processor–memory systems ◦ most active access patterns extracted from application data structures ◦ different memory architecture configurations that can match needs of access patterns are obtained, assuming a simple connectivity model ◦ next, different comm. architectures are considered for these memory architecture configurations, and the most suitable interconnect and memory architecture is selected from a pareto-optimal curve Srinivasan et al. [DATE ‘ 05] presented an approach to simultaneously consider bus topology splitting and memory bank partitioning during synthesis ◦ with the goal of reducing system energy © 2008 Sudeep Pasricha & Nikil Dutt 60

Co-synthesis with Memory Pasricha et al. [DATE ‘ 06] proposed the COSMECA methodology for

Co-synthesis with Memory Pasricha et al. [DATE ‘ 06] proposed the COSMECA methodology for memory and comm. architecture synthesis ◦ Synthesize bus matrix topology and protocol parameters Goal: obtain a least cost system, having minimal number of buses while satisfying performance and memory area constraints COSMECA selects memory blocks from a library populated by several types of memories ◦ on-chip SRAMs, DRAMs, EPROMs, EEPROMs, … Each memory type can have variants in library, having different ◦ capacities, areas, ports, operating frequencies and access times Memory synthesis in COSMECA ◦ selects appropriate physical memories from library ◦ maps application arrays, scalars to ©physical memories selected 2008 Sudeep Pasricha & Nikil Dutt 61

Co-synthesis with Memory Application memory requirements are initially represented by abstract data blocks (DBs)

Co-synthesis with Memory Application memory requirements are initially represented by abstract data blocks (DBs) in a CTG DBs are initially grouped together into virtual memories © 2008 Sudeep Pasricha & Nikil Dutt 62

Co-synthesis with Memory DBs are merged at this initial step only if they have

Co-synthesis with Memory DBs are merged at this initial step only if they have ◦ similar edges (i. e. , edges from the same masters) and ◦ non-overlapping access Subsequently, the enhanced CTG with VMs is used as an input to a branch and bound based bus matrix synthesis framework to generate minimal cost solution © 2008 Sudeep Pasricha & Nikil Dutt 63

Co-synthesis with Memory Heuristic used to map VMs to physical memories from library ◦

Co-synthesis with Memory Heuristic used to map VMs to physical memories from library ◦ finds N solutions that satisfy memory area and performance constraints of design Generate memory access traces that are used to determine the extent of access overlap of VMs at each slave access point (SAP) ◦ after simulating best solution If the overlap is below a user defined overlap threshold T, the VMs are merged © 2008 Sudeep Pasricha & Nikil Dutt 64

Co-synthesis with Memory VMs are then mapped to physical memories from library Initially, best

Co-synthesis with Memory VMs are then mapped to physical memories from library Initially, best memory from the library is selected for a VM that fits capacity requirements and has max. port bandwidth If performance constraints are not met even for the memory with best performance, the matrix solution is discarded ◦ the next best matrix solution from the set of (ranked) matrix solutions is selected If performance constraints and memory area constraints are met, the solution is added to the final solution database Next, to lower memory area, VMs at SAPs are © 2008 Sudeep Pasricha & Nikil Dutt 65

Co-synthesis with Memory Experiments with MPSo. C applications ◦ Shown below: PYTHON application synthesis

Co-synthesis with Memory Experiments with MPSo. C applications ◦ Shown below: PYTHON application synthesis © 2008 Sudeep Pasricha & Nikil Dutt 66

Co-synthesis with Memory Trade-off curve between number of buses and memory area Impact of

Co-synthesis with Memory Trade-off curve between number of buses and memory area Impact of threshold value © 2008 Sudeep Pasricha & Nikil Dutt 67

Co-synthesis with Memory COSMECA saves 25– 40% in the number of buses in the

Co-synthesis with Memory COSMECA saves 25– 40% in the number of buses in the matrix and from 17– 29% in memory area compared to traditional approach © 2008 Sudeep Pasricha & Nikil Dutt 68

Co-synthesis with Memory Meyer et al. [CODES+ISSS ‘ 07] attempted to extend COSMECA by

Co-synthesis with Memory Meyer et al. [CODES+ISSS ‘ 07] attempted to extend COSMECA by adding layout-awareness during cosynthesis ◦ co-synthesis is performed using a SA-based algorithm Results indicate 20– 27% cost reduction for a synthetic DSP software pipeline case study by using the approach ◦ compared to an approach that separately allocates memory and synthesizes buses A few limitations ◦ Only bus topology synthesis is performed – bus parameter synthesis is neglected ◦ memory synthesis does not consider different memory Sudeep Pasricha & Nikil Dutt types - only SRAM memories are© 2008 supported 69

Summary Designers need techniques that can efficiently explore the increasingly intractable comm. architecture design

Summary Designers need techniques that can efficiently explore the increasingly intractable comm. architecture design space ◦ to satisfy and optimize constraints during comm. architecture design Presented research on techniques for efficient busbased communication architecture synthesis ◦ Scope to extend synthesis techniques for emerging applications A lot of open problems still remain to be solved, especially in the areas of low level physical and circuit level synthesis approaches (refer book chapter for more details) ◦ wire metal layer assignment ◦ wire sizing optimization © 2008 Sudeep Pasricha & Nikil Dutt 70

© 2008 Sudeep Pasricha & Nikil Dutt 71

© 2008 Sudeep Pasricha & Nikil Dutt 71