ApplicationSpecific Customization of Soft Processor Microarchitecture Peter Yiannacouras

  • Slides: 19
Download presentation
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Processors and FPGA Systems n Processors lie at the “heart” of FPGA systems UART

Processors and FPGA Systems n Processors lie at the “heart” of FPGA systems UART Custom Logic Soft Processor Memory Interface Ethernet n Performs coordination and even computation ¨ Better processors => less hardware to design We seek improvement through customization 2

Motivating Application-Specific Customizations of Soft Processors 1. FPGA Configurability ¨ 2. A soft processor

Motivating Application-Specific Customizations of Soft Processors 1. FPGA Configurability ¨ 2. A soft processor might be used to run either: a) b) c) 3. Can consider unlimited processor variants A single application A single class of applications Many applications, but can be reconfigured Applications differ in architectural requirements ¨ Can specialize architecture for each application We want to evaluate effectiveness of specialization 3

Research Goals n To investigate 1. The potential for “Application-tuning” n n 2. “Instruction-set

Research Goals n To investigate 1. The potential for “Application-tuning” n n 2. “Instruction-set Subsetting” n n 3. Tune processor microarchitecture to favour an application Preserve general purpose functionality Sacrifice general purpose functionality Eliminate hardware not required by application Combination of both methods Measure efficiency gained through real implementations 4

SPREE System (Soft Processor Rapid Exploration Environment) Processor ISADescription Datapath ■ Input: Processor description

SPREE System (Soft Processor Rapid Exploration Environment) Processor ISADescription Datapath ■ Input: Processor description ■ Made of hand-coded components ■ SPREE System SPREE RTL 1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation ■ Multi-cycle/variable-cycle FUs ■ Multiplexer select signals ■ Interlocking ■ Branch handling ■ Output: Synthesizable Verilog 5

Back-End Infrastructure RTL Benchmarks (Mi. Bench, Dhrystone 2. 1, RATES, Xi. Risc) Modelsim Quartus

Back-End Infrastructure RTL Benchmarks (Mi. Bench, Dhrystone 2. 1, RATES, Xi. Risc) Modelsim Quartus II 4. 2 RTL Simulator CAD Software Stratix 1 S 40 C 5 1. Cycle Count 2. Resource Usage 3. Clock Frequency 4. Power We can measure area/performance/energy accurately 6

Comparison to Altera’s Nios II n Has three variations: ¨ Nios II/e – unpipelined,

Comparison to Altera’s Nios II n Has three variations: ¨ Nios II/e – unpipelined, no HW multiplier ¨ Nios II/s – 5 -stage, with HW multiplier ¨ Nios II/f – 6 -stage, dynamic branch prediction 7

Architectural Parameters Used in SPREE n Multiplication Support ¨ Hardware n Shifter implementation ¨

Architectural Parameters Used in SPREE n Multiplication Support ¨ Hardware n Shifter implementation ¨ Flipflops, n FU or software routine multiplier, or LUTs Pipelining ¨ Depth n (2 -7 stages) ¨ Organization ¨ Forwarding We focus on core microarchitecture 8

SPREE vs Nios II -3 -stage pipe -HW multiply -Multiply-based shifter faster smaller 9

SPREE vs Nios II -3 -stage pipe -HW multiply -Multiply-based shifter faster smaller 9

Exploration of Soft Processor Architectural Customizations 1. 2. 3. Architectural-tuning Instruction-set subsetting Combination (Arch-tuning

Exploration of Soft Processor Architectural Customizations 1. 2. 3. Architectural-tuning Instruction-set subsetting Combination (Arch-tuning + Subsetting) 10

1. Architectural Tuning Experiment n Vary the same parameters ¨ ¨ ¨ n Determine

1. Architectural Tuning Experiment n Vary the same parameters ¨ ¨ ¨ n Determine 1. 2. n Multiplication Support Shifter implementation Pipelining Best overall (general purpose) processor Best per application (application-tuned) Metric: Performance per Area (MIPS/LE) ¨ Basically inverse of Area-Delay product 11

Performance per Area of All Processors 32% 14. 1% 12

Performance per Area of All Processors 32% 14. 1% 12

2. Instruction-set Subsetting n SPREE automatically removes ¨ Unused connections ¨ Unused components n

2. Instruction-set Subsetting n SPREE automatically removes ¨ Unused connections ¨ Unused components n Reduce processor by reducing the ISA ¨ Can n create application-specific processor Eliminate unused parts of the ISA 13

Instruction-set Usage of Benchmarks n Applications do not use complete ISA Strong potential for

Instruction-set Usage of Benchmarks n Applications do not use complete ISA Strong potential for hardware reduction 14

Area Reduction from Subsetting Fraction of Area 23% Area reduced by 60% in some,

Area Reduction from Subsetting Fraction of Area 23% Area reduced by 60% in some, 23% on average Similar reductions for energy, small impact on performance 15

3. Combining Application Tuning and Instruction-set Subsetting n Subsetting is effective on its own

3. Combining Application Tuning and Instruction-set Subsetting n Subsetting is effective on its own ¨ n Can apply subsetting on top of tuning Compare different customization methods Tuning 2. Subsetting 3. Tuning + Subsetting 1. 16

Combining Application Tuning and Instruction-set Subsetting 14% 16% 25% Tuning reduces the waste that

Combining Application Tuning and Instruction-set Subsetting 14% 16% 25% Tuning reduces the waste that subsetting eliminates 17

Summary of Presented Architectural Conclusions n Application tuning ¨ 14% average efficiency gain ¨

Summary of Presented Architectural Conclusions n Application tuning ¨ 14% average efficiency gain ¨ Will increase with more architectural axes n Instruction-set Subsetting ¨ Up to 60% area & energy savings ¨ 16% average efficiency gain n Combined Tuning & Subsetting ¨ 25% average efficiency gain 18

Future Work n Consider other promising architectural axes ¨ Branch prediction, aggressive forwarding ¨

Future Work n Consider other promising architectural axes ¨ Branch prediction, aggressive forwarding ¨ ISA changes ¨ Datapaths (eg. VLIW) ¨ Caches and memory hierarchy n Compiler assistance ¨ Can improve tuning & subsetting 19