Distributed Processors Allow Revolutionary Hardware Software Partitioning 8
Distributed Processors Allow Revolutionary Hardware & Software Partitioning 8 th Workshop on Electronis for LHC Experiments 9 – 13 September 2002, Colmar (France) Authors: Jean-Reynald Mace & Jean-Louis Brelet / Xilinx Version 1. 1 –March 2002 – APD / J-L Brelet & P Hardy - All right reserved - © XILINX 2002
Agenda • System Partitioning – Traditional techniques – Innovative approaches • Example 1: DES Encryption Algorithm – HW solution compared to SW solution • Example 2: Wireless LAN – HW / SW trade-off • Enabling Technology: Virtex-II Pro 2
System Partitioning • Definition: in: – Fixed HW components: • FPGA, ASIC, ASSP, … – SW components: • Code running on CPU, 3 DSP processors, microcontrollers, … Managem ent • Today Implementation Control system level architecture into specific HW and SW components based upon application requirements” Application – “The mapping of a Embedded Software Hardware Components
Example System Functions • Hardware: 4 • Software: – Physical Layer – Protocol Stack – Memory Interfaces – User Interface – Protocol Bridges – Diagnostics – Finite State Machine – Control – Signal Processing – Encryption
Optimal Solutions Enabled by On-Demand Architectural Synthesis • Hardware: • Software: – Physical Layer – Protocol Stack – Memory Interfaces – User Interface – Protocol Bridges – Diagnostics – FSM – Control – Signal Processing – Encryption Flexible Mapping 5
Traditional System Design • Fixed HW / SW partitioning • Early and final architecture mapping • Critical commitment made at concept level SW mgr Embedded Software SW Dev. SW dev Fixed Interface Hardware Components HW mgr HW eng 6 SW dev HW eng PCB eng
New System Partitioning • Flexible HW / SW partitioning – Enables tradeoffs throughout the process • Architecture redefinition possible – Tune for optimal performance and cost SW Team Embedded Software Hardware Components 7 Flexible Interface HW Team
Innovative Partitioning • New System Approach: – Enables non-traditional system architecture • SW modules can be implemented in HW • HW modules can be moved to SW – Requires a scalable and flexible platform that enables optimal HW / SW integration. • Co-Design Methodology – Design attributes optimized during development (Performances, resource usage, …) – SW developers and HW engineers create solutions at module level for optimal systems 8
Agenda • System Partitioning – Traditional techniques – Innovative approaches • Example 1: DES Encryption Algorithm – HW solution compared to SW solution • Example 2: Wireless LAN – HW / SW trade-off • Enabling Technology: Virtex-II Pro 9
DES Overview • DES Algorithm: – Message is split into fixed length blocks – Encode each block with fixed « key » – Block length = 64 bits (advanced 128 -b), Key length = 56 bits • 3 DES Is An Enhanced Version of Encryption / Decryption Key 1 Data compatible with DES Encrypt 10 Key 2 Key 3 – If Key 1 = Key 2 = Key 3, than 3 DES is fully Decrypt Encrypt
System Integrator’s Dilemma • DES Is Simple Algorithm • System Engineer Has To Evaluate: – SW coding compare to HW implementation – Need for a specific processor and performances – Need for a dedicated solution – Cost effective solution of ASSP – Level of customization required – Fixed or flexible implementation 11
Architectural Options • Popular DES Algorithm Is Available As SW code: – Public domain C or C++ code – Example of encryption data rate for 128 -b DES : • TMS 320 C 62 xx at 200 MHz delivers ~100 Mbps(*) • MIPS 64 -b RISC at 250 MHz delivers ~400 Mbps(*) • Pentium III at 1 GHz delivers ~ 460 Mbps(*) • HW Implementation Available At: – www. opencores. org – Over 1. 5 Gbps data rate in Virtex-II at 130 MHz (*) • 3 DES 56 -b Algorithm Achieves 10. 7 Gbps Throughtput – Xilinx record-breaking announcement in April 2002 * Source: Helion Technology Limited, Xilinx Design Consultant (Xilinx Xcell journal Issue 43 Summer 2002) 12
Mixed HW / SW Solution • Encryption / Decryption Data Path: – DES encryption module is called twice Processor DES Decryption Algorithm HW Encrypt Data Flow – Decryption requires more compute power 13 Processor HW DES Encryption Algorithm Decrypt Data Flow
Full HW Implementation • Full HW Processor Implementation: – Shared Encryptor Other Tasks HW Encrypt Decrypt Processor HW Encrypt Or No Processor? Decrypt Encrypt Data Flow 14 Data Flow • Full HW Pipelined Solution – Easy to add Parallelism – Easy to couple to distributed processors
Choices of HW / SW Partition • Various Solutions To Fit Each Performances / Cost Requirement: – SW vs HW vs mixed HW / SW • New Approach: – On-Demand Architecture Synthesis to modify HW / SW trade-off dynamically • Distributed Processors Offer Another Level Of Flexibility Through Parallel Implementations 15
Agenda • System Partitioning – Traditional techniques – Innovative approaches • Example 1: DES Encryption Algorithm – HW solution compared to SW solution • Example 2: Wireless LAN – HW / SW trade-off • Enabling Technology: Virtex-II Pro 16
Networking Application: Wireless LAN Qo. S MPEG 2 File transfert: FTP 17 FTP MPEG 2 Intra Forwarding Technique: Video transmission
Wireless LAN: Access point Architecture Application Layer Presentation Layer Session Layer Transport Layer HOST I/F Bus Network Layer Medium Access Control. Data Link Layer Channel Access Control Physical Layer 18
Wireless LAN: Qo. S • Wireless LAN example: – Intra forwarding technique – Complex algorithms of network access with few levels of prioritization in order to guarantee the 256 Ptrs Qo. S • Select Most Urgent Frame – Choice is based on few Pointer : 19 parameters: CA P UP DIS NR L RL DB – priority (Po to Pn) – Lifetime (Normalized Ptr of the Received Frame 64 Bits Po Pn Ptr of the Selected Frame
Qo. S: Full Hardware • Design in FPGA: – FSM like design with adder/subtractor (~1000 LUT / 50 MHz) – One table of pointers implemented in FPGA Block Ram • 2 BRAM used for 4 priorities – Pipelining used – Easy to manage the Lifetime (update every 10 us) • Complex Function in HW: – Electing two frames from one table of pointer by scrolling and comparison techniques F 1 Table of ptr of frames to be transmitted F 3 F 0 Elected ptr of Frame to transmit 20 F 11 Permutation
Qo. S: Full Software • Design in Firmware: – – Simple ~250 lines of C Code Microprocessor used: PPC 405 One table of pointers per priority in external memory (SDRAM) Sort algorithm very well known and easy to implement • Complex Function in SW: – System Real Time Requirement – Frame lifetime controlled by a set of timers • In the same time new frame is coming, existing frame should move from upper priority table Highest Priority Table F 11 F 41 F 31 F 52 F 10 F 7 F 21 F 22 F 0 F 11 Elected ptr of Frame to transmit 21 …. .
Qo. S: Mixed HW / SW • Hardware Module: – Liftetime and move ptr between tables – Design : • FSM like design with adder/subtractor (~200 lut-50 MHz) • 4 tables of pointers per priority with the FPGA Block Ram • Updated Lifetime by scrolling • Semaphore F 41 F 52 F 7 F 22 …. . • Software/Hardware interface: – Semaphore based communication • Software Module: – Insertion and sort of the tables – Design : • Easy to write (~200 lines of C Code) • Sort algorithm • Semaphore lib 22 F 41 F 52 F 7 F 22
Design Solutions Comparison • Full HW Solution – Full control of events timing and easy parallelism design – Complex HDL coding of the FSM • State Machines architecture requires advanced expertise • Important validation time in design cycle • Full SW Solution – Easy coding in C (sort algorithm) and flexibility – Difficult to handle real-time constraints • Performances limitation by Von Neumann architecture (Proc. ) • Mixed HW / SW Solution: The Best Of The both Worlds – Offer advantages of HW and SW solution with the 23 right partitioning
Agenda • System Partitioning – Traditional techniques – Innovative approaches • Example 1: DES Encryption Algorithm – HW solution compared to SW solution • Example 2: Wireless LAN – HW / SW trade-off • Enabling Technology: Virtex-II Pro 24
Platform FPGA Architecture • A Solution that provides: – IP Immersion • The ability to integrate a wide variety of Hard & Soft IP – A single Platform for Hard-IP HW functions Soft-IP System Connectivity 25 multiple applications – Total customization – Full Hardware and Firmware upgradability
Virtex-II Pro Platform FPGA • 3. 125 Gbps Multi-Gigabit Transceivers (MGTs) • Supports 10 Gbps standards Up to 24 per device • Power. PC 405 Core • 300+ MHz / 450+ DMIPS Performance • Up to 4 per device 26 MGT Fabric • IP-Immersion™ Fabric • Active. Interconnect™ • 18 Kb Dual-Port RAM • Xtreme™ Multipliers • 16 Global Clock Domains MGT
High-Bandwidth Communications 6. 4 Gb/sec Timers Fetch & and Decode Debug Logic I-Cache 16 KB MMU D-Cache 16 KB Execution Unit 32 x 32 b GPR ALU, MAC 6. 4 Gb/sec OCM™ Technology Acceleration Block. RAMs Logic 27 • Code (SW) and data are stored in BRAM, without any external resources • On-Chip Memory (OCM) offers an unique data bandwidth between FPGA fabric (HW) and embedded Power. PC core (SW) • High-Bandwidth Communications between distributed
Flexibility of Programmable Systems • Nearly all Systems are composed of: – Logic + Memory + Processor • Virtex-II Pro enables optimum “system partitioning” between Hardware and Software Performing SW tasks in HW is Inefficient Performing HW tasks in SW is Slow Provides the best of both worlds 28
Conclusion • Distributed Processors Allow Flexible HW / SW Partitioning: – Optimal mapping at the module level – Offer to design with best solution of both worlds • Virtex-II Pro The First Programmable System To Enable True Architectural Synthesis: – Unique bandwidth between embedded processors and HW – Unique on-chip solution provides an applicationspecific mix of logic, memory, integrated processors, and high bandwidth I/O 29
- Slides: 29