VCC FunctionArchitecture CoDesign Modelling and Examples EE 249

  • Slides: 82
Download presentation
VCC: Function-Architecture Co-Design: Modelling and Examples EE 249: November 7, 2002 Grant Martin Fellow,

VCC: Function-Architecture Co-Design: Modelling and Examples EE 249: November 7, 2002 Grant Martin Fellow, Cadence Berkeley Labs With thanks to Frank Schirrmeister, Jean-Yves Brunel and Paolo Giusto CADENCE CONFIDENTIAL

Agenda • System-level So. C Design – The Rise in Abstraction • The VCC

Agenda • System-level So. C Design – The Rise in Abstraction • The VCC Design Flow as an example of Function-Architecture Co-Design • Performance Modeling • Architectural Services • Co-Design Example: Automotive Distributed SW • Co-Design Example: Design Space Exploration of Multimedia platform

Embedded System on Chip (So. C) Design System Environment Zone 4: Global Satellite Zone

Embedded System on Chip (So. C) Design System Environment Zone 4: Global Satellite Zone 3: Suburban Zone 2: Urban Zone 1: In-Building Pico-Cell Macro-Cell Micro-Cell Requirements Specification Untimed, Unclocked, C/C++ Level Memory Implementation Timed, Clocked, RTL Level Software Analog SOC Firmware CORE Implementation P/C µ Embedded Software Characterization Testbench Refinement Design Export Embedded Systems Design

How did we use abstraction in the past? Step 1 – Layout to Transistor

How did we use abstraction in the past? Step 1 – Layout to Transistor Digital Abstraction 1970’s • Switching delay of the transistor • The design complexity exceeds what designers can comprehend and think through at the layout level • Interconnect delay between transistors • Transistor level simulation allows to verify the logic of digital and analog designs based on transistor switching characteristics abstract Transistor Model Capacity Load 1970’s cluster

How did we use abstraction in the past? Step 2 – Transistors to Gates

How did we use abstraction in the past? Step 2 – Transistors to Gates Digital Abstraction 1980’s § Gate delay § The design complexity exceeds what designers can comprehend and simulate at the transistor level § Interconnect delay between gates abstract Transistor Model Capacity Load 1970’s abstract Gate Level Model Capacity Load cluster 1980’s cluster § Gate level simulation allows to verify the logic of digital designs based on gate switching characteristics.

How did we use abstraction in the past? Step 3 – Gates to RTL-HDL

How did we use abstraction in the past? Step 3 – Gates to RTL-HDL Digital Abstraction 1990’s § Not really a abstraction of performance (e. g. SDF only used for gate to layout to gate) § The design complexity exceeds what designers can comprehend and simulate at the gate level alone § HDL is first used for fast verification, synthesis allows translation of text into gates § Synthesis algorithms map text to actual registers and logic in between based on characterized gate and wire-load libraries § Gate and wire-load delays are refined after layout. SDF emerges as format abstract Gate Level Model Capacity Load 1980’s abstract § Textual statements result in “many gates” after synthesis RTL cluster 1990’s

And what is the next step? IP Block Performance abstract Transistor Model Capacity Load

And what is the next step? IP Block Performance abstract Transistor Model Capacity Load 1970’s abstract Capacity Load cluster on tro l MPEG Audio Decoder Ctr l ac h s/C Bu DR AM Graphics Engine u. C eo RTL Vid On-Chip Ram D-C I/F abstract § … by attaching performance data to timing SDF free functional models Gate Level Model abstract Modeling of Performance for IP Blocks Register File Timers ac he I-C ac he De MPE co G de r Ports DMAC RTL Clusters cluster 1980’s 1990’s Year 2000 +

And what is the next step? abstract Transistor Model Capacity Load 1970’s abstract SDF

And what is the next step? abstract Transistor Model Capacity Load 1970’s abstract SDF Gate Level Model Capacity Load abstract Modeling of Performance for Communication between IP Blocks RTL abstract Inter IP Communication Performance cluster RTL Clusters cluster 1980’s 1990’s Year 2000 +

And what is the next step? IP Block Performance Inter IP Communication Performance abstract

And what is the next step? IP Block Performance Inter IP Communication Performance abstract 1970’s abstract Transistor Model Capacity Load cluster RTOS u. C s/C Bu D-C On-Chip Ram RTL Clusters 1990’s SW Models Discontinuity: Embedded Software cluster 1980’s Driver ac h Ctr l e. C on tro l MPEG Audio Decoder DR AM Graphics Engine Tasks Register File Timers ac he I-C ac he De MPE co G de r cluster RTL abstract I/F Vid eo Apply this to Hardware and Software SDF Gate Level Model Capacity Load Ports DMAC Year 2000 +

The Platform-Based Design Concept Taking Design Block Reuse to the Next Level Pre-Qualified/Verified Foundation-IP*

The Platform-Based Design Concept Taking Design Block Reuse to the Next Level Pre-Qualified/Verified Foundation-IP* Foundation Block + Reference Design MEM Hardware IP SW IP Application Space CPU FPGA Scaleable bus, test, power, IO, clock, timing architectures Processor(s), RTOS(es) and SW architecture Methodology / Flows: Programmable System-level performance evaluation environment *IP can be hardware (digital or analogue) or software. IP can be hard, soft or ‘firm’ (HW), source or object (SW) Foundry-Specific Pre-Qualification Rapid Prototype for End-Customer Evaluation So. C Derivative Design Methodologies Foundry Targetting Flow

The Platform-Based Design Concept Platform Type Examples “Full Application HW/SW Platform” Examples: –TI OMAP

The Platform-Based Design Concept Platform Type Examples “Full Application HW/SW Platform” Examples: –TI OMAP –Philips n. Experia, –Infineon MGold “Processor Centric” Examples: – ARM Micropack – ST 100 Platform – Improv Jazz Improv JAZZ Platform “Communication Centric” Examples: –Palmchip –Sonics SONICs Architecture DMA Silicon. Backplane ™ (patented) { C CPU MEM DSP I MPEG O

System House Requirements … exploring and developing on top of So. C Platforms Application

System House Requirements … exploring and developing on top of So. C Platforms Application Space Platform Based Design Objectives • Define the application instance to be implemented to satisfy product requirements defined by consumer Platform Specification • Specify the system platform together with suppliers accordingly • Evaluate top down different instances of SOC platforms System Platform Design Space Exploration Architectural Space

SOC Provider Requirements … designing So. C Platforms and Sub-systems Application Space Platform Based

SOC Provider Requirements … designing So. C Platforms and Sub-systems Application Space Platform Based Design Objectives • Define the SOC platform instance so that multiple instances of applications can be mapped to the same system platform • Present this to system customers as SOC Design-Kit and optimally leverage economy of scale for SOC platform instance Platform Design Space Exploration System Platform Specification • Provide bottom up instances of SOC platform for evaluation without disclosing the details of the IP Architectural Space

The VCC Design Flow: An example of Function-Architecture Co. Design CADENCE CONFIDENTIAL

The VCC Design Flow: An example of Function-Architecture Co. Design CADENCE CONFIDENTIAL

VCC Front End Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function

VCC Front End Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration Architectur e IP CPU/DSP RTOS Bus, Memory HW SW Platform Configuration … at the un-clocked, timingaware system level • Enabling communication within the SOC Design Chain • Design Space Exploration with abstracted Performance Models • Untimed Functional and Performance Verification • Integration Platform Design, Optimization and Configuration

VCC Front End Functional Integration and Analysis Embedded System Requirements Functional IP C/C++ SDL

VCC Front End Functional Integration and Analysis Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration Architectur e IP CPU/DSP RTOS Bus, Memory HW SW Platform Configuration … at the un-clocked, timingaware system level

VCC Front End Define Architectural Options and Configuration Embedded System Requirements Functional IP C/C++

VCC Front End Define Architectural Options and Configuration Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration Architectur e IP CPU/DSP RTOS Bus, Memory HW SW Platform Configuration … at the un-clocked, timingaware system level

VCC Front End Define Function Architecture Mapping Embedded System Requirements Functional IP C/C++ SDL

VCC Front End Define Function Architecture Mapping Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration Architectur e IP CPU/DSP RTOS Bus, Memory HW SW Platform Configuration … at the un-clocked, timingaware system level

VCC Front End Run Performance Analysis for Platform Configuration Embedded System Requirements Functional IP

VCC Front End Run Performance Analysis for Platform Configuration Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Cache Results Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration Processor Load Architectur e IP CPU/DSP RTOS Bus, Memory HW SW Platform Configuration … at the un-clocked, timingaware system level Process Gant Chart Analysis

VCC Backend • Linking System Level Design to Implementation – Fast track to prototyping

VCC Backend • Linking System Level Design to Implementation – Fast track to prototyping – Fast track to software development – Design consistency through the design flow Communication Refinement, Integration & Synthesis Software Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. Design Export … after initial platform configuration through design refinement and communication synthesis

VCC Backend Communication Refinement and Synthesis Communication Refinement Communication Synthesis VCC Model to RTOS

VCC Backend Communication Refinement and Synthesis Communication Refinement Communication Synthesis VCC Model to RTOS Protocol Component RTOS Abstract Token Communication Refinement, Integration & Synthesis Software Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. VCC Model RTOS to CPU Protocol Component Bus Slave to VCC Model Component CPU to Bus Protocol Component Bus to Bus Slave Component CPU Bus Slave Bus Model Design Export … after initial platform configuration through design refinement and communication synthesis

VCC Backend Export to Implementation (Design and Test Bench) VCC System Exploration Communication Refinement

VCC Backend Export to Implementation (Design and Test Bench) VCC System Exploration Communication Refinement Flow To Implementation Hardware Top-level System Test Bench Software on RTOS Communication Refinement, Integration & Synthesis Software Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. Design Export … after initial platform configuration through design refinement and communication synthesis

VCC Flow Summary Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function

VCC Flow Summary Embedded System Requirements Functional IP C/C++ SDL SPW Simulink Platform Function Platform Architecture System Integration Performance Analysis and Platform Configuration Communication Refinement, Integration & Synthesis Software Hardware Assembly Implementation Level Verification Synthesis / Place & Route etc. Architectur e IP CPU/DSP RTOS Bus, Memory HW SW Platform Configuration … at the un-clocked, timingaware system level Design Export … after initial platform configuration through design refinement and communication synthesis

Performance Modeling … using Abstraction CADENCE CONFIDENTIAL

Performance Modeling … using Abstraction CADENCE CONFIDENTIAL

Functional Simulation Gate Level Functional Simulation Functio n • Gate switching defines functionality •

Functional Simulation Gate Level Functional Simulation Functio n • Gate switching defines functionality • Combination of gate functionality defines “functionality” of the design • Simulation slow in complex systems as huge amounts of events are to be processed

Functional Simulation Using VCC at the System-Level Functional Simulation SPW State. Charts • Function

Functional Simulation Using VCC at the System-Level Functional Simulation SPW State. Charts • Function of system blocks executed – General Descriptions – C, C++, State Charts, OMI – Application specific SDL – SPW, Telelogic SDL, Matlab Simulink, ETAS Ascet tra s Ab on cti Functi on C++ • Functional execution defined as “fire and return” with a OMI 4. 0 compliant Simulink discrete event simulation infrastructure C • Simulation is as fast as the abstract, un-timed models simulate

Performance Simulation Gate Level Functional Simulation • Gate switching functionality Functio n Performanc e

Performance Simulation Gate Level Functional Simulation • Gate switching functionality Functio n Performanc e Dt Performance Simulation SDF and Gate Level Library • functionality annotated with intrinsic gate delay • interconnect delay modeled from capacity Performan ce Inter. Connect Capacity Refinement • SDF data is refined after layout is carried out

VCC Performance Simulation System-Level Block Performance Modeling Performance Simulation Performance • functionality annotated with

VCC Performance Simulation System-Level Block Performance Modeling Performance Simulation Performance • functionality annotated with intrinsic delay models • Delay Script and Inline Models, refined after implementation Interleaver Ab Dt Functi on Performanc e n tio ac str IP Functional Model Scripted Delay Model Forward Error Correction FEC() { f = x. read(); // FEC function here y. write(r); } Inline Delay Model Dt IP Functional Model Forward Error Correction FEC on CPU FEC in slow HW FEC() { // FEC_ip_implem f = x. read(); FEC in fast HW delay() { // FEC_ip_implem // FEC function here input(x); delay() Delay { Script y. write(r); run(); input(x); // FEC_ip_implem } delay(200*cps); run(); delay() { output(y); delay(128*cps); input(x); } output(y); run(); } delay(64*cps); output(y); } Annotated IP Functional Model FEC() { f = x. read(); // FEC function part A here __Delay. Cycles(60*cps); // FEC function part B here __Delay. Cycles(78*cps); // FEC function part C here __Delay. Cycles(23*cps); y. write(r); }

VCC Performance Simulation System Level Block Interconnect Performance Modeling Value()/Enable() from Behavior 2 Post()

VCC Performance Simulation System Level Block Interconnect Performance Modeling Value()/Enable() from Behavior 2 Post() from Behavior 1 Shared Memory Receiver Communication Pattern Sender RTOS Ab s Standard C Library Functi on CPU tra cti o n Pattern Services Architecture Services Performa nce Inter. Connect Capacity Memory Access RAM Memory CPU Port Bus Adapter RAM Port Slave Adapter ASIC Port Bus Adapter Bus Arbiter

VCC Performance Simulation Enabled through Architecture Services in VCC A B Post(5) Value() Semaphore

VCC Performance Simulation Enabled through Architecture Services in VCC A B Post(5) Value() Semaphore Protected Sem. Prot_Send Sem. Prot_Recv Sem. Prot_Send mutex_lock; memcpy; signal set. Enabled wait; memcpy; signal User Visible RTOS Sw. Mutexes write Memory. Access Bus. Master Pattern Services read Architecture Services CPU Mem Slave. Adapter bus. Request Bus. Arbiter arbiter. Request/Release bus. Indication

VCC Performance Modeling … … the System Level extension of SDF Classical Gate Level

VCC Performance Modeling … … the System Level extension of SDF Classical Gate Level Technology Functio n VCC System Level Technology IP Block Performance Performanc e Dt SDF and Gate Level Library Performance System Level Library Function C, C++, SPW, SDL, Simulink, Statecharts Interleaver Dt SPW Interconne ct Performan Interce Connect Capacity State. Charts IP Block Interconnec t Performanc e SDL Simulink C++ C

How to get the performance numbers… IP Block Performance Modeling Top Down Flow •

How to get the performance numbers… IP Block Performance Modeling Top Down Flow • In a pure top down design flow the performance models are “Design Requirements” for functional models • They are refined using bottom up techniques in due course throughout the project Bottom Up Flow • SOC Provider characterizes IP portfolio, e. g. of a Integration platform – using HDL model simulation – using software simulation on ISS – using benchmarking on SOC IP Functional Model Scripted Delay Model Forward Error Correction FEC() { f = x. read(); // FEC function here y. write(r); } IP Functional Model Forward Error Correction FEC on CPU FEC in slow HW FEC() { // FEC_ip_implem f = x. read(); FEC in fast HW // {FEC_ip_implem // FECdelay() function here input(x); delay() {Delay Script y. write(r); run(); input(x); } // FEC_ip_implem delay(200*cps); run(); delay() { output(y); delay(128*cps); input(x); } output(y); run(); } delay(64*cps); output(y); } Inline Delay Model Annotated IP Functional Model FEC() { f = x. read(); // FEC function part A here __Delay. Cycles(60*cps); // FEC function part B here __Delay. Cycles(78*cps); // FEC function part C here __Delay. Cycles(23*cps); y. write(r); }

How to get the performance numbers… IP Block Interconnect Performance Modeling Top Down Flow

How to get the performance numbers… IP Block Interconnect Performance Modeling Top Down Flow • Datasheets for architectural IP information are entered in parameters for architectural services • Can be done fast by System Integrator without SOC Provider • Refinement with SOC Provider models Bottom Up Flows • Architectural IP is profiled using HDL simulation, ISS or silicon and data is entered in VCC architectural services Value()/Enable() from Behavior 2 Post() from Behavior 1 Shared Memory Communication Pattern Sender Receiver RTOS Standard C Library CPU Memory Access RAM Memory CPU Port Bus Adapter RAM Port Slave Adapter ASIC Port Bus Adapter Bus Arbiter Pattern Services Architecture Services

How to get the performance numbers… Software Estimation for ANSI C code (“Whitebox C”)

How to get the performance numbers… Software Estimation for ANSI C code (“Whitebox C”) • Estimation of software performance prior to implementation • CPU characterized as Virtual Processor Model – Using a Virtual Machine Instruction Set – Used for dynamic control SW estimation during performance simulation taking into account bus loading, memory fetching, and register allocation • Value – True co-design: SW estimation using annotation into C Code (as opposed to to simulation in instruction simulators used in co-verification) – Good for early system scheduling, processor load estimation – Two orders of magnitude faster than ISS – Greater than 80 percent accuracy – Enables pre-implementation decision but is not a verification model

How to get the performance numbers… Virtual Processor Model Characterization Methods Data Book Approach

How to get the performance numbers… Virtual Processor Model Characterization Methods Data Book Approach – CPU data book information to count cycles and estimate VIM Calibration Suite using “Best Fit” – Run Calibration Suite on VIM and ISS – Solve a set of linear equations to minimize difference Application Specific Calibration Suite – using the “Best Fit” method but use application specific routines for automotive, wireless telecom, multimedia etc. Exact Count on ISS – cycle counts exactly derived from ISS run – Filter specific commands out (e. g. OPi etc. )

How to get the performance numbers… Software Estimation for ANSI C code (“Whitebox C”)

How to get the performance numbers… Software Estimation for ANSI C code (“Whitebox C”) Virtual Machine Instruction Set Model LD, 3. 0 LI, 1. 0 ST, 3. 0 OP. c, 3. 0 OP. s, 3. 0 OP. i, 4. 0 OP. l, 4. 0 OP. f, 4. 0 OP. d, 6. 0 MUL. c, 9. 0 MUL. s, 10. 0 MUL. i, 18. 0 MUL. l, 22. 0 MUL. f, 45. 0 MUL. d, 55. 0 DIV. c, 19. 0 DIV. s, 110. 0 DIV. i, 118. 0 DIV. l, 122. 0 DIV. f, 145. 0 DIV. d, 155. 0 IF, 5. 0 GOTO, 2. 0 SUB, 19. 0 RET, 21. 0 Load from Data Memory Load from Instr. Mem. Store to Data Memory Simple ALU Operation Complex ALU Operation Test and Branch Unconditional Branch to Subroutine Return from Subroutine

How to get the performance numbers… Software Estimation for ANSI C code (“Whitebox C”)

How to get the performance numbers… Software Estimation for ANSI C code (“Whitebox C”) ANSI C Input char *event; int proc; if (*(event+proc) & 0 x 1: 0 x 0). . . Œ Whitebox C declare ports Compile generated C and run natively Performance Estimation Assembler ld ld add ld ldi and cmp br ba #event, R 1 #proc, R 2 R 1, R 2, R 3 (R 3), R 4 #0 x 1, R 5 R 4, R 5, R 6 R 0, R 6, R 7, LTRUE LFALSE Virtual Processor Model Analyse ld ld op ld li op ts -br basic blocks compute delays Generate new C with delay counts Architecture Characterization

Architectural Services Example CADENCE CONFIDENTIAL

Architectural Services Example CADENCE CONFIDENTIAL

Architecture Service • The service is the element that defines the functionality of an

Architecture Service • The service is the element that defines the functionality of an architecture • A service is coded in C++ and performs a specific role to model architecture, for example: – bus arbitration – memory access – interrupt propagation – etc.

Example of Services ASIC Post Pattern Behavior Sender Bus. Master Mem Bus. Slave Memory

Example of Services ASIC Post Pattern Behavior Sender Bus. Master Mem Bus. Slave Memory Bus. Arbiter

Example of Services • Behavior calls Post, i. e. , send a communication •

Example of Services • Behavior calls Post, i. e. , send a communication • Pattern hears Post and directs ASIC block’s Bus. Master to send a communication • Bus. Master asks the Bus Block’s Bus. Arbiter for use of the bus • Bus. Arbiter grants the bus, so communication can go to Memory Block • Memory Block’s Bus. Slave receives communication and forwards to memory • Memory stores communication.

Categories of Services • Pattern Service – services that coordinate the communication of architecture

Categories of Services • Pattern Service – services that coordinate the communication of architecture services • Architecture Service – services that define the functionality of architecture • Internal Service – generic, default service used during functional simulation

Pattern Service • A pattern coordinates architectural services that collectively model a communication path

Pattern Service • A pattern coordinates architectural services that collectively model a communication path from sender to receiver • Patterns are composed of a sender service and a receiver service – Sender service defines Post – Receiver service defines Enabled/Value Post Pattern Sender Enabled/ Pattern Value Receiver • Both the sender and receiver service direct the actions of architecture services to send/receive communication

Basic Example • Let’s assume two behaviors. • b 1 and b 2 talk

Basic Example • Let’s assume two behaviors. • b 1 and b 2 talk to each other: – b 1 says Post; b 2 says Value – and visa versa

Basic Example (cont) • What does it mean for b 1 to talk to

Basic Example (cont) • What does it mean for b 1 to talk to b 2? • What does it mean for b 1 to say Post? • What does it mean for b 2 to say Value? • We should consider an architecture to give meaning to b 1 and b 2. • We should consider how the behavior blocks map to the architecture.

Basic Example (cont) • Let’s assume the following architecture:

Basic Example (cont) • Let’s assume the following architecture:

Basic Example (cont) • Here we map the behavior to the architecture:

Basic Example (cont) • Here we map the behavior to the architecture:

Basic Example (cont) • What do we see in the mapping diagram? – b

Basic Example (cont) • What do we see in the mapping diagram? – b 1 is mapped to software. – b 2 is mapped to hardware. – b 1 to b 2 communication is set to Shared Memory. – b 2 to b 1 communication is set to to Interrupt Register Mapped. • For simplicity’s sake, we’re focusing on b 1 -to-b 2 communication. – b 2 to b 1 will be ignored for now. • If b 1 talks to b 2, how does that look when mapped to an architecture? – What happens when b 1 says Post? – What happens when b 2 says Value? – Note b 1 to b 2 is shared memory communication.

Basic Example (cont) • Using Shared Memory, we have the following sequence of communication:

Basic Example (cont) • Using Shared Memory, we have the following sequence of communication: 1. b 1 writes to memory: b 1 èRTOS èCPU èBus èMem 2. b 2 reads from memory: b 2 èASIC èBus èMem

Basic Example (cont) • So b 1 talks to b 2 through the various

Basic Example (cont) • So b 1 talks to b 2 through the various architecture components: – b 1 says Post and that becomes a write to memory. – b 2 says Value and that becomes a read from memory. • What is the underlying mechanism that propagates Post/Value through the architecture? – It’s something called the “service”.

Commercial Example ST Microelectronics IP models support codesign efforts By Benoit Clement System-Level Design

Commercial Example ST Microelectronics IP models support codesign efforts By Benoit Clement System-Level Design Engineer Doha Benjelloun System-Level Design Engineer Co-Design Methodology for Systems &Architecture (CMSA) STMicroelectronics, Grenoble, France http: //www. eetimes. com/story/OEG 20010913 S 0069

Example of Co-Design – Distributed Automotive SW CADENCE CONFIDENTIAL

Example of Co-Design – Distributed Automotive SW CADENCE CONFIDENTIAL

Distributed Automotive Applications over networks – “Software-Software Codesign” • Electronic Control Units (ECU’s) •

Distributed Automotive Applications over networks – “Software-Software Codesign” • Electronic Control Units (ECU’s) • Standard buses (TTP, CAN, Flex. Ray) • Standard Platforms

Current Design Practices Requirements Engine Control f 1 ASCET f 3 analysis “functional network”

Current Design Practices Requirements Engine Control f 1 ASCET f 3 analysis “functional network” f 2 specification “zero time assumption” f 4 Gear-Box Control ECU-1 ECU-2 CAN/TTP-bus ECU-3. c . c. . . Architecture system design “real world assumption” implementation “automatic target code gen. ” integration & calibration “step into a real car” production & after sales “handling at the garage” • Integration is done too late In the car • Tools are PER-ECU – conservative, costly, no tradeoffs Development process Matlab

Virtual Integration Platform for Distributed Automotive Applications Software Components C-Code Matlab C++ Development Process

Virtual Integration Platform for Distributed Automotive Applications Software Components C-Code Matlab C++ Development Process IP’s Architectural Models Buses CPUs Buses Operating Systems Analysis Specification ASCET Implementation ASCET Calibration System Behavior f 1 f 3 System Architecture f 2 Mapping Performance Simulation Evaluation of Architectural and Partitioning Alternatives Refinement After Sales Service VCC

Scenarios for SW-driven co-development f f f ? ? f f f f ?

Scenarios for SW-driven co-development f f f ? ? f f f f ? ? f f f f

ASCET-SD imported project in VCC Message 1 Message 2 Message 3 Message 4 These

ASCET-SD imported project in VCC Message 1 Message 2 Message 3 Message 4 These are behavioral memories HW_Intrpt 1 Test. Bench HW_Intrpt 2 HWIntrpt 1 Project. A HWIntrpt 2 The test-bench can include Matlab imported models This is the ASCET imported Project as well as VCC authored models Process 8 SW_Interrupt HW_Interrupt Process 9 Process 10

Universal Communications Model of Bus Behavioral Diagram Behavioral Memory 1 Module A Module B

Universal Communications Model of Bus Behavioral Diagram Behavioral Memory 1 Module A Module B ECU 1 RTOS PPC internal bus Mem Bus Controller Architectural Bus Memory Peak Load Broadcast Bus PPC internal bus Mem Bus Controller

Example Design Flow (1): Power Window • Definition of a behavioral diagram: Import of

Example Design Flow (1): Power Window • Definition of a behavioral diagram: Import of functional components (software projects and modules)

Design Flow (2) • Generation of an ideal communication between the functional components –

Design Flow (2) • Generation of an ideal communication between the functional components – No delay or error handling considered. – Functional co-verification

Design Flow (3) • Creation of an architectural diagram in VCC

Design Flow (3) • Creation of an architectural diagram in VCC

Design Flow (4) • Mapping the software modules onto the ECU – Either retaining

Design Flow (4) • Mapping the software modules onto the ECU – Either retaining the original per-ecu mapping from ASCET-SD or creating a new one

Design Flow(5) • Generation of the CPU scheduling – Either manually or automatically in

Design Flow(5) • Generation of the CPU scheduling – Either manually or automatically in case the original scheduling is preserved Hierarchical Scheduler Single Task Scheduler Parent Scheduler

Design Flow (6) • Computation Performance Simulation – No communication performance estimation – Co-verification

Design Flow (6) • Computation Performance Simulation – No communication performance estimation – Co-verification of Computational Resource ‘fit’

Design Flow(7) • Design iterations – Re-distribution of the functionality and tuning of the

Design Flow(7) • Design iterations – Re-distribution of the functionality and tuning of the scheduling C ha nn el 1 ha nn el 2 l ha nn el 2 ha nn el 1 C ha nn el 2 C ha nn el 1 l C l l C ha nn el 2 RTOS PPC TTPController C ha nn le 2 RTOS PPC RTOS TTPController PPC TTPController C ha nn el 2 RTOS PPC TTPController C ha nn e 1 C ha nn el 1 RTOS PPC TTPController C RTOS PPC RTOS TTPController C anh ne l 1 C C annh el 2 Channel 1 PPC_TTP Channel 1 PPC_TTP Channel 2 Channel 2 Channel 1 PPC_TTP Channel 2 TTP Channel 2 PPC TTPController C ha nn e 1 C h an ne l 2 RTOS PPC TTPController C ha nn e 1 C h an ne l 2 PPC RTOS TTPController C anh ne l 1 C anh ne 2 RTOS PPC TTPController C anh ne l 1 PPC TTPController C Channel 1 PPC_TTP RTOS ha nn el 1 C ha nn le 2 Channel 1 PPC_TTP Channel 2

Design Flow(8) • Initialization of the UCM performance model. – Automated generation of an

Design Flow(8) • Initialization of the UCM performance model. – Automated generation of an initial communication matrix that carries the dependency of the functional system mapping. • Definition of a specific bus protocol implementation – UCM parameterization. Definition of the communication cycle layout. Data frame definition. Bus Type Pattern

Design Flow (9) • Performance simulation including the bus latencies • Full System co-verification:

Design Flow (9) • Performance simulation including the bus latencies • Full System co-verification: both communications and computation Bus Type Pattern Design Iterations

Example of Co-Design: Design Space Exploration of Multimedia Platform CADENCE CONFIDENTIAL

Example of Co-Design: Design Space Exploration of Multimedia Platform CADENCE CONFIDENTIAL

Multimedia Applications – Design Space Exploration Classification Application Analyst Process Analyst Communication Analyst Data

Multimedia Applications – Design Space Exploration Classification Application Analyst Process Analyst Communication Analyst Data Analysis Workbooks for specific roles (Excel+Stat. Box) vcc. Map Database Design Data Exported from VCC diagrams using “VCCAPI” Design Space Navigator (SQL queries) vcc. Dse Database YSH 1 YHS 1 YSH 1 vcc. Sim Database YSH 1 VHH 1 YHS 1 YSH 1 YSS 1 YHS 1 YSH 1 YHH 1 YSS 1 YSH 1 YHHP YHH 1 YHHP YSH 1 YHHP YHH 1 YHHP YSH 1 YHSP YHS 1 YHSP Performance Data Collected by VCC probes under control of system events

Export Mapping Data to Data. Base YSH 1 YHS 1 YSH 1 VHH 1

Export Mapping Data to Data. Base YSH 1 YHS 1 YSH 1 VHH 1 YHS 1 YSH 1 YHH 1 YSS 1 YSH 1 YHHP YHH 1 YHHP YSH 1 YHHP YHH 1 YHHP YSH 1 YHH 1 YSH 1 YHSP YHS 1 YHSP YSH 1 YSH 1

System-Observation-Windows For, e. g. Each MPEG Frame Measure… e. g. process activity Frame Port

System-Observation-Windows For, e. g. Each MPEG Frame Measure… e. g. process activity Frame Port Name Transaction Actual Intra ID Nb Nb 1 Tvld_bits_In 324 20, 736 1 Tvld_cmd_In 208 1 Tvld_prop_pic_In 1 1 1 Tvld_prop_slice_In 36 36 1 mb_QFS_Out 1, 622, 704 1 Thdr_status_Out 208 1 Tisiq_prop_mb_Out 1, 622 1 Tdec. MV_prop_mv_Out 1, 622 0. 04 … …. 2 Tvld_bits_In 525 33, 600 2 Tvld_cmd_In 206 2 Tvld_prop_pic_In 1 1 2 Tvld_prop_slice_In 36 36 2 mb_QFS_Out 1, 619 621, 696 2 Thdr_status_Out 206 2 Tisiq_prop_mb_Out 1, 619 2 Tdec. MV_prop_mv_Out 1, 619 0. 04 2 Tdec. MV_prop_pred_Out 1, 619 0. 09 … … e. g. MEMORY usage Item Delay 0. 14 0. 13 0. 00 6. 66 0. 01 0. 03 1, 622 Delay 0. 02 0. 00 0. 65 0. 01 0. 03 0. 04 … 0. 23 0. 12 0. 00 6. 53 0. 01 0. 03 1, 619 …. 0. 03 0. 00 0. 65 0. 01 0. 03 0. 04 Frame ID 1 1 1 Pict 2 1 … 2 2 2 1, 619 0. 09 2 … … 2 Requestor Delay Mean Behavior/in_es_out_sender 5. 64 E-06 Behavior/decode/t_vld_Tvld_bits_In_receiver 2. 91 E-06 Behavior/decode/t_hdr_Tvld_cmd_Out_sender 1. 17 E-14 Behavior/decode/t_vld_Tvld_cmd_In_receiver 2. 40 E-07 Behavior/decode/t_hdr_Tvld_prop_pic_Out_sender 0. 00 E+00 Behavior/decode/t_vld_Tvld_prop_pic_In_receiver 0. 00 E+00 Behavior/decode/t_hdr_Tvld_prop_slice_Out_sender 1. 22 E-14 …. … Behavior/in_es_out_sender 5. 64 E-06 Behavior/decode/t_vld_Tvld_bits_In_receiver 2. 91 E-06 Behavior/decode/t_hdr_Tvld_cmd_Out_sender 1. 17 E-14 Behavior/decode/t_vld_Tvld_cmd_In_receiver 2. 40 E-07 Behavior/decode/t_hdr_Tvld_prop_pic_Out_sender 0. 00 E+00 Behavior/decode/t_vld_Tvld_prop_pic_In_receiver 0. 00 E+00 Behavior/decode/t_hdr_Tvld_prop_slice_Out_sender 1. 22 E-14 Behavior/decode/t_vld_Tvld_prop_slice_In_receiver 1. 22 E-14 Delay St. Dev 3. 00 E-06 1. 20 E-06 2. 40 E-07 1. 17 E-14 2. 16 E-06 7. 20 E-07 … 3. 00 E-06 1. 20 E-06 2. 40 E-07 1. 17 E-14 2. 16 E-06 7. 20 E-07

“Probe-Synch” & Observer Probes q Probe-Synch is triggered on conditions in a behavioral block

“Probe-Synch” & Observer Probes q Probe-Synch is triggered on conditions in a behavioral block (i. e. MPEG frame decoded) q Control up to 200 distributed observer probes of different types: i. e Memory probes, Bus probes, CPU “Delay” probes etc… MEMORY OBSERVER PROBE CPU OBSERVER PROBE q Observer Probes record summary data at the granularity defined by the peeker BUS OBSERVER PROBE

Queries 1: link Map & Simulation data Calculate Average Actual & Intrinsic Communication Rates

Queries 1: link Map & Simulation data Calculate Average Actual & Intrinsic Communication Rates of all “YAPI” application level Channels Simulation “Frame” Context. Simulation results Key to retrieve Design “Mapping” Decision

Queries 2: calculate basic Statistics Calculate Average Actual & Intrinsic Communication Rates of all

Queries 2: calculate basic Statistics Calculate Average Actual & Intrinsic Communication Rates of all “YAPI” application level Channels

Application Analyst 15: t_hdr 14: t_mem. Man (Probe-Synch) 28 Frames in … 2 sec

Application Analyst 15: t_hdr 14: t_mem. Man (Probe-Synch) 28 Frames in … 2 sec 8: t_output

Process Analyst Process “t-predict” IO-(Read Trans. ) Exec Delay On CPU “ 2” Body-Function

Process Analyst Process “t-predict” IO-(Read Trans. ) Exec Delay On CPU “ 2” Body-Function Execution Delay On CPU “ 2” Body-Function Exec Delay On Memory “ 32” IO- (Write Trans. ) Execution Delay On CPU “ 2”

Communication Analyst

Communication Analyst

Communication Analyst Very bad “performance” compared to “intrinsic rate”; Arbitration issue or… always busy

Communication Analyst Very bad “performance” compared to “intrinsic rate”; Arbitration issue or… always busy waiting for input?

Principal Component Analysis Characteristics Linked characteristics Opposed characteristics

Principal Component Analysis Characteristics Linked characteristics Opposed characteristics

Principal Component Analysis Results on 108 Communication Channelsmap_FAKIR_Diagrams. MPEG_VIPER_SH 2 - Principal Component Analysis

Principal Component Analysis Results on 108 Communication Channelsmap_FAKIR_Diagrams. MPEG_VIPER_SH 2 - Principal Component Analysis - YAPI Application Level Communication 11 Rd. VHH 1 11 Wr. VHH 1 0, 9 Avg. Ofnb. Item fifo. Depth 0, 7 42 Wr. YHHP 54 Rd. YHH 1 -- axe F 2 (23 %) --> 0, 5 26 Rd. YHHP 59 Rd. YHH 1 54 Wr. YHH 1 59 Wr. YHH 1 42 Rd. YHHP 26 Wr. YHHP 4 Rd. VHH 1 10 Rd. VHH 1 4 Wr. VHH 1 10 Wr. VHH 1 0, 3 Avg. Ofactual. Delay 0, 1 -0, 1 Avg. Ofnb. Transaction 40 Wr. VHH 1 65 Rd. YHH 1 43 Rd. YHHP 16 Wr. YHHP 12 Rd. VHH 1 36 Wr. YHS 1 23 Rd. YHS 1 62 Rd. YHSP 75 Rd. YHH 1 19 Wr. YSH 1 35 Wr. YHH 1 44 Wr. YHS 1 75 Wr. YHH 1 48 Rd. YSH 1 40 Rd. VHH 1 65 Wr. YHH 1 71 Rd. YHHP 71 Wr. YHHP 16 Rd. YHHP 19 Rd. YSH 1 47 Rd. YHS 1 9 Wr. VHH 1 45 Rd. YSH 1 45 Wr. YSH 1 2 Wr. VHH 1 46 Wr. YHH 1 43 Wr. YHHP 64 Rd. YSH 1 9 Rd. VHH 1 47 Wr. YHS 1 36 Rd. YHS 1 53 Rd. YSH 1 37 Rd. YSH 1 12 Wr. VHH 1 2 Rd. VHH 1 25 Rd. YSH 1 7 Wr. YHH 1 70 Rd. YSH 1 14 Rd. YSH 1 66 Rd. YSS 1 50 Rd. YSH 1 53 Wr. YSH 1 23 Wr. YHS 1 66 Wr. YSS 1 70 Wr. YSH 1 33 Rd. YSH 1 44 Rd. YHS 1 48 Wr. YSH 1 37 Wr. YSH 1 62 Wr. YHSP 27 Rd. YHSP 64 Wr. YSH 1 25 Wr. YSH 1 50 Wr. YSH 1 14 Wr. YSH 1 33 Wr. YSH 1 27 Wr. YHSP 30 Rd. YHH 1 17 Rd. YHHP 3 Rd. VHH 1 18 Rd. YSH 1 56 Rd. YHH 1 18 Wr. YSH 1 item. Size -0, 5 -0, 9 -0, 7 -0, 5 Avg. Of. Perf 56 Wr. YHH 1 17 Wr. YHHP 22 Wr. YHHP 30 Wr. YHH 1 -0, 3 -0, 1 -- axe F 1 (34 %) --> 0, 3 0, 5 0, 7 0, 9

Clustering into Communication Port Classes map_FAKIR_Diagrams. MPEG_VIPER_SH 2 - Principal Component Analysis YAPI Application

Clustering into Communication Port Classes map_FAKIR_Diagrams. MPEG_VIPER_SH 2 - Principal Component Analysis YAPI Application Level Communication Class 1 4 11 Rd. VHH 1 11 Wr. VHH 1 Class 3 3 -- axe F 2 (23 %) --> 42 Wr. YHHP 54 Rd. YHH 1 2 Class 2 26 Rd. YHHP 59 Rd. YHH 1 54 Wr. YHH 1 59 Wr. YHH 1 42 Rd. YHHP 26 Wr. YHHP 4 Rd. VHH 1 10 Rd. VHH 1 4 Wr. VHH 1 10 Wr. VHH 1 1 Class 4 40 Wr. VHH 1 65 Rd. YHH 1 43 Rd. YHHP 16 Wr. YHHP 12 Rd. VHH 1 36 Wr. YHS 1 23 Rd. YHS 1 62 Rd. YHSP 0 30 Rd. YHH 1 17 Rd. YHHP -1 56 Rd. YHH 1 Class 5 -2 -4 -3 -2 75 Rd. YHH 1 19 Wr. YSH 1 35 Wr. YHH 1 44 Wr. YHS 175 Wr. YHH 1 48 Rd. YSH 1 40 Rd. VHH 1 65 Wr. YHH 1 16 Rd. YHHP 19 Rd. YSH 1 47 Rd. YHS 1 71 Rd. YHHP 45 Rd. YSH 1 45 Wr. YSH 1 2 Wr. VHH 1 46 Wr. YHH 1 71 Wr. YHHP 43 Wr. YHHP 64 Rd. YSH 1 47 Wr. YHS 1 36 Rd. YHS 1 9 Wr. VHH 1 53 Rd. YSH 1 37 Rd. YSH 1 12 Wr. VHH 1 9 Rd. VHH 1 25 Rd. YSH 1 14 Rd. YSH 1 50 Rd. YSH 1 53 Wr. YSH 1 23 Wr. YHS 1 44 Rd. YHS 1 33 Rd. YSH 1 7 Wr. YHH 1 70 Rd. YSH 1 48 Wr. YSH 1 37 Wr. YSH 1 66 Rd. YSS 1 62 Wr. YHSP 27 Rd. YHSP 64 Wr. YSH 1 25 Wr. YSH 1 50 Wr. YSH 1 14 Wr. YSH 1 33 Wr. YSH 1 27 Wr. YHSP 66 Wr. YSS 1 70 Wr. YSH 1 3 Rd. VHH 1 18 Rd. YSH 1 18 Wr. YSH 1 56 Wr. YHH 1 17 Wr. YHHP 22 Wr. YHHP 30 Wr. YHH 1 -1 -- axe F 1 (34 %) --> 0 1 2

Summary • Talked about System-level So. C Design and changes in abstractions • Discussed

Summary • Talked about System-level So. C Design and changes in abstractions • Discussed the VCC Design Flow as an example of Function. Architecture Co-Design, including the key concepts of: – Performance Modeling – Architectural Services • Described two usage examples of function-architecture co-design, illustrating the pragmatic use of these concepts by real design teams: – Automotive Distributed SW – Design Space Exploration of Multimedia platform • As a result, I hope you are convinced of both the need for system level design for So. C, and the real possibility of creating practical tools to support it • Next important step for such tools: a common standardised model integration infrastructure based on System. C