ApplicationSpecific Customization of Soft Processor Microarchitecture Peter Yiannacouras
- Slides: 19
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Edward S. Rogers Sr. Department of Electrical and Computer Engineering
Processors and FPGA Systems n Processors lie at the “heart” of FPGA systems UART Custom Logic Soft Processor Memory Interface Ethernet n Performs coordination and even computation ¨ Better processors => less hardware to design We seek improvement through customization 2
Motivating Application-Specific Customizations of Soft Processors 1. FPGA Configurability ¨ 2. A soft processor might be used to run either: a) b) c) 3. Can consider unlimited processor variants A single application A single class of applications Many applications, but can be reconfigured Applications differ in architectural requirements ¨ Can specialize architecture for each application We want to evaluate effectiveness of specialization 3
Research Goals n To investigate 1. The potential for “Application-tuning” n n 2. “Instruction-set Subsetting” n n 3. Tune processor microarchitecture to favour an application Preserve general purpose functionality Sacrifice general purpose functionality Eliminate hardware not required by application Combination of both methods Measure efficiency gained through real implementations 4
SPREE System (Soft Processor Rapid Exploration Environment) Processor ISADescription Datapath ■ Input: Processor description ■ Made of hand-coded components ■ SPREE System SPREE RTL 1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation ■ Multi-cycle/variable-cycle FUs ■ Multiplexer select signals ■ Interlocking ■ Branch handling ■ Output: Synthesizable Verilog 5
Back-End Infrastructure RTL Benchmarks (Mi. Bench, Dhrystone 2. 1, RATES, Xi. Risc) Modelsim Quartus II 4. 2 RTL Simulator CAD Software Stratix 1 S 40 C 5 1. Cycle Count 2. Resource Usage 3. Clock Frequency 4. Power We can measure area/performance/energy accurately 6
Comparison to Altera’s Nios II n Has three variations: ¨ Nios II/e – unpipelined, no HW multiplier ¨ Nios II/s – 5 -stage, with HW multiplier ¨ Nios II/f – 6 -stage, dynamic branch prediction 7
Architectural Parameters Used in SPREE n Multiplication Support ¨ Hardware n Shifter implementation ¨ Flipflops, n FU or software routine multiplier, or LUTs Pipelining ¨ Depth n (2 -7 stages) ¨ Organization ¨ Forwarding We focus on core microarchitecture 8
SPREE vs Nios II -3 -stage pipe -HW multiply -Multiply-based shifter faster smaller 9
Exploration of Soft Processor Architectural Customizations 1. 2. 3. Architectural-tuning Instruction-set subsetting Combination (Arch-tuning + Subsetting) 10
1. Architectural Tuning Experiment n Vary the same parameters ¨ ¨ ¨ n Determine 1. 2. n Multiplication Support Shifter implementation Pipelining Best overall (general purpose) processor Best per application (application-tuned) Metric: Performance per Area (MIPS/LE) ¨ Basically inverse of Area-Delay product 11
Performance per Area of All Processors 32% 14. 1% 12
2. Instruction-set Subsetting n SPREE automatically removes ¨ Unused connections ¨ Unused components n Reduce processor by reducing the ISA ¨ Can n create application-specific processor Eliminate unused parts of the ISA 13
Instruction-set Usage of Benchmarks n Applications do not use complete ISA Strong potential for hardware reduction 14
Area Reduction from Subsetting Fraction of Area 23% Area reduced by 60% in some, 23% on average Similar reductions for energy, small impact on performance 15
3. Combining Application Tuning and Instruction-set Subsetting n Subsetting is effective on its own ¨ n Can apply subsetting on top of tuning Compare different customization methods Tuning 2. Subsetting 3. Tuning + Subsetting 1. 16
Combining Application Tuning and Instruction-set Subsetting 14% 16% 25% Tuning reduces the waste that subsetting eliminates 17
Summary of Presented Architectural Conclusions n Application tuning ¨ 14% average efficiency gain ¨ Will increase with more architectural axes n Instruction-set Subsetting ¨ Up to 60% area & energy savings ¨ 16% average efficiency gain n Combined Tuning & Subsetting ¨ 25% average efficiency gain 18
Future Work n Consider other promising architectural axes ¨ Branch prediction, aggressive forwarding ¨ ISA changes ¨ Datapaths (eg. VLIW) ¨ Caches and memory hierarchy n Compiler assistance ¨ Can improve tuning & subsetting 19
- Processor microarchitecture
- Structured computer organization
- Isa computer architecture
- Agner fog
- Microarchitecture diagram
- µop
- Microinstruction format
- Dspace jspui customization
- Prophet 21 dynachange
- Workday customization
- Mass customization adidas
- Project file management ansys
- Customization tool
- Bulk customization using excel in pfms
- When customization occurs late in the supply chain
- Mass customization and rapid product development
- Sopc builder
- Gcd processor
- Advanced processor technology
- Digital transparent processor