Application of Instruction AnalysisSynthesis Tools to x 86s
- Slides: 18
Application of Instruction Analysis/Synthesis Tools to x 86’s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information Engineering National Sun Yat-sen University Kaohsiung, Taiwan 80441 R. O. C. ijhuang@cie. nsysu. edu. tw
Superscalar Model under Investigation • Decoupled superscalar architecture – register renaming – branch prediction • Assumptions – no cache miss – fast instruction fetcher and decoder – 100% branch prediction correct – load/store unit: 2 cycles; others: 1 cycle – large RS and ROB
The Problem Q: How many functional units are needed in an x 86 compatible superscalar core? A: The distribution of functional unit usage in typical x 86 programs FU Usage 4 A, 2 M, 1 B 3 A, 0 M, 0 B 2 A, 2 M, 1 B 2 A, 1 M, 0 B 1 A, 1 M, 1 B Frequency
How to Obtain FU Distribution? • Simulation-based approaches [Shinatani, 1995], [Davidson, 1995], [Hara et al. , 1996], etc. – Running on different CPU platforms – Slow, but can explore many configurations • Monitoring-based approaches [Adams et al. , 1989], [Bhandarkar et al. , 1997], [Huang, 1997], etc. – Directly running on the same CPU platform – Fast, but work for only the configuration of the underlying CPU platform
A Fast Performance/Cost Approximation Environment
ASIA: Automatic Synthesis of Instruction Set Architedcture • GOAL: analyzes and synthesizes application-specific instruction set for pipelined uni-processors. • APPROACH: a micro-operation scheduling engine based on a simulated annealing algorithm The superscalar core is an application-specific RISC core for x 86 emulation
ASIA-II: Extensions for Superscalar Architecture • Register renaming – Temporary registers are used on the fly to resolve anti and data dependencies. • Execution window – Instructions are dispatched sequentially. • Branch prediction – Effective sizes of basic blocks are enlarged.
Register Renaming • In ASIA-II: ignore output, anti dependencies during scheduling
Realistic Patterns in the Execution Window • Balanced distribution: 0 bjective function includes both time steps and H/W counts • Window effect: MOP’s are displaced with a limited distance; long distance is possible with many iterations of displacement. as long as performance is improved.
Basic Block Expansion (Eblocks) Due to Branch Prediction
A Small Example from Word 97
Extended Basic Blocks
Scheduled Eblocks
Description of Benchmark
Micro-operation Level Parallelism (MSP)
Functional Unit Usage • Notation: A - Integer unit M - Memory unit B - Branch unit F - Floating unit • Others is the sum of that frequent less than 1. 0%
Accumulated Coverage of Functional Unit Allocation (NSC 98) (IA-64) (AMD K 6) (Pentium Pro) (Base Machine)
Conclusions • Synthesis/analysis tools have been used to observe the functional unit usage and MLP in superscalar core. • Speedup over simulation is over 600 times. • FUTURE WORK: investigate various microarchitecture features – register renaming vs. branch prediction – functional unit optimization
- Cbw 8086
- Differentiated instruction vs individualized instruction
- Direct instruction vs indirect instruction
- Instruction set
- Tools for classroom instruction that works
- What is the nature of online platform and application
- Application rationalization template
- Application support tools
- Marking tools in sewing
- Very long instruction word
- Packing instruction 620
- Timing diagram of out instruction in 8085
- Sim writing strategies
- Lmc little man computer
- Cmp instruction in 8086 example
- 8051 assembly language programming
- 8051 boolean processor
- Rafsanjani soga secondary school
- Systematic instruction hunter model