RunJumpRun Bouquet of Instruction Pointer Jumpers for High
Run-Jump-Run: Bouquet of Instruction Pointer Jumpers for High Performance Instruction Prefetching Vishal Gupta, Neelu Shivprakash Kalani and Biswabandan Panda Department of Computer Science and Engineering Indian Institute of Technology, Kanpur
Outline • • Motivation JIP Framework Prefetching in JIP Framework Results 2
Scope of Instruction Prefetching Gap of 25. 25% between next-2 line prefetcher and perfect L 1 I Our prefetcher achieves 27. 75% performance improvement 3
Program Control Flow Sequential Branch Our Proposal: Sequential RUN - JUMP - RUN 4
Bouquet of Instruction Pointer Jumpers JIP Non-Branch Instruction Runner Direct Jump Direct Call Branch Single Target Jumper-I Single Target Jump Table Multiple Target Jumper-II Multiple Target Jump Table Conditional Indirect Jump Indirect Call Return 5
Outline • • Motivation JIP Framework Prefetching in JIP Framework Results 6
JIP Framework 0 x 00 jmp 0 x 04 mov r 1, 0 x 1 C 0 x 08 Single Target Jump Table Trigger IP Target IP NRU mov r 2, 2 0 x 00 0 x 04 0 0 x 0 C add r 1, #4 0 x 10 0 x 20 0 0 x 10 call [r 1] 0 x 18 0 x 08 0 0 x 14 sub r 3, 1 0 x 18 jeqz 0 x 08 Multiple Target Jump Table Trigger IP 0 x 10 Target IPs 0 x 20 0 x 24 Array of Targets 0 x 28 0 1 Target Confidence 2 1 1 1 7
JIP Framework Single Target Jump Table (7800 entry, Fully Associative) Trigger IP Target IP NRU Multiple Target Jump Table – I (1024 entry, Direct Mapped) Tag Target IPs Array of Targets Target Confidence 3 targets 8 indices 3 targets Multiple Target Jump Table – II (512 entry, Direct Mapped) Tag Target IPs Array of Targets Target Confidence 8 targets 16 indices 8 targets Storing 64 -bit IP with this configuration, Overhead : 300 KB 8
IP Compression 9
JIP Framework Mapper Table (512 entry, Fully Associative) Uncompressed IP Compressed IP Single Target Jump Table (7800 entry, Fully Associative) Trigger IP Target IP NRU Multiple Target Jump Table – I (1024 entry, Direct Mapped) Tag Target IPs 3 targets Array of Targets 8 indices Target Confidence 3 targets Multiple Target Jump Table – II (512 entry, Direct Mapped) Tag Target IPs 8 targets Array of Targets 16 indices Target Confidence 8 targets 10
Accurate but Late Prefetch On average, 16% of the prefetch requests are late 11
IPs recurring in same sequence IP: A IP: B IP: C IP: D IP: E 12
IPs recurring in same sequence On average, 37% of the IPs are recurring in same sequence 13
Temporal Table Recent Access Queue (25 entry, FIFO) Temporal Table (7150 entry, Fully Associative) Leader IP Follower IP Trigger prefetching for Follower IP when Leader IP recurs. 14
JIP with Temporal Table On average, 3% of the prefetch requests are late 15
JIP Framework Single Target Jump Table (SJT) (7800 entry, Fully Associative) Trigger IP Target IP NRU Multiple Target Jump Table – I (MJT-I) (1024 entry, Direct – Mapped) Tag Target IPs Array of Targets Target Confidence Multiple Target Jump Table – II (MJT-II) (512 entry, Direct – Mapped) Tag Target IPs Array of Targets Mapper Table (512 entry, Fully Associative) Uncompressed IP Compressed IP Temporal Table (7150 entry, Fully Associative) Uncompressed IP Compressed IP Target Confidence Hardware Overhead: 127. 8 KB 16
Outline • • Motivation JIP Framework Prefetching in JIP Framework Results 17
Prefetching in JIP Framework 1 48 bits L 1 -I Access IP 16 bits IP Mapper Compressed IP 2 Lookahead based on depth and degree 3 4 9 bits Miss 16 bits Hit JIP Tables Temporal Table Hit Next/Target IP Recent Prefetch Queue Prefetch IP 9 bits 16 bits Reverse IP Mapper Uncompressed IP 48 bits 16 bits L 1 -I Prefetch Queue 18
Prefetch Degree & Depth • Prefetch Degree: 4 • Lookahead Depth: 14 Lookahead Depth of 14 IP: A A+1 Cache block: A+2 B B B+1 B+2 C C C+1 D C+2 D D+1 D+2 E E+1 E+2 E 19
Extended Lookahead JIP waiting for L 1 -I accesses: JIP after two cycles: 20
Extended Lookahead But which path to take? Begin with the Last Prefetch IP Begin with the Follower IP from Temporal Table 21
Extended Lookahead Confidence counter to select between paths Continue extended lookahead for subsequent three cycles Prefetch degree of one Terminate extended lookahead if a new L 1 -I access occurs 22
Outline • • Motivation JIP Framework Prefetching in JIP Framework Results 23
Performance Improvement: 27. 75% 24
Utility of different components of JIP 25
Summary We propose a bouquet of IP jumpers IP Compression to reduce storage overhead Leader - Follower IP Pairs to improve timeliness With JIP, Average performance improvement: 27. 75% Storage overhead: 127. 8 KB 26
Thank You 27
- Slides: 27