ThreadFair Memory Request Reordering Kun Fang Nick Iliev

  • Slides: 18
Download presentation
Thread-Fair Memory Request Reordering Kun Fang, Nick Iliev, Ehsan Noohi, Suyu Zhang, and Zhichun

Thread-Fair Memory Request Reordering Kun Fang, Nick Iliev, Ehsan Noohi, Suyu Zhang, and Zhichun Zhu Dept. of ECE, University of Illinois at Chicago 1

Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 2

Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 2

SDRAM Organization (Device) SDRAM device Bank 0 Bank 7 Row decoder … Column decoder

SDRAM Organization (Device) SDRAM device Bank 0 Bank 7 Row decoder … Column decoder addr Row buffer cmd COL ACT PRE Row buffer data 8 bits 3

Request Reordering R 0 B 0 Rd R 0 B 0 Wr R 1

Request Reordering R 0 B 0 Rd R 0 B 0 Wr R 1 R 0 B 0 Rd R 0 Act R 0 Col. Rd Col Rd Pre R 0 Col Bus turn around Act R 1 Pre R 0 Col Wr Col Act R 1 Pre R 1 T_WTR Act R 0 Col Pre R 1 Col Rd Bus turn around Col Wr Pre R 0 Col. Rd Pre R 0 4

Outline • Background • Thread-Fair Memory Request Reordering – Observation • Result • Conclusion

Outline • Background • Thread-Fair Memory Request Reordering – Observation • Result • Conclusion 5

Blocked By ROB Head ROB Mem Controller B 0 R 1 B 0 R

Blocked By ROB Head ROB Mem Controller B 0 R 1 B 0 R 0 B 0 R 0 B 0 R 0 B 0 R 1 R 0 No Reordering Reorder 6

Outline • Background • Thread-Fair Memory Request Reordering – Algorithm • Related Work •

Outline • Background • Thread-Fair Memory Request Reordering – Algorithm • Related Work • Conclusion 7

Thread-Fair Memory Request Reordering Read First WQ > High WM N Issue Rd Hit?

Thread-Fair Memory Request Reordering Read First WQ > High WM N Issue Rd Hit? Y Y Y Issue Wr Hit? N N Issue ROB Head? Y N Write First N Issue Rd FCFS? Y WQ < Low WM Y Issue Rd FCFS? N N Y Issue Rd Hit Issue Wr Hit 8

Outline • Background • Thread-Fair Memory Request Reordering – Implementation • Result • Conclusion

Outline • Background • Thread-Fair Memory Request Reordering – Implementation • Result • Conclusion 9

Scheduler Design Read Pending Queue Index 1 Row Addr Index Row Addr 4 6

Scheduler Design Read Pending Queue Index 1 Row Addr Index Row Addr 4 6 5 Write Pending Queue Read Queue Write Queue 1 6 Read Row Hit Queue Write Row Hit Queue Bank n 10

Hardware Overhead • 4 Channel, 2 Ranks/Channel, 8 Banks/Rank – 64 Banks • Read/Write

Hardware Overhead • 4 Channel, 2 Ranks/Channel, 8 Banks/Rank – 64 Banks • Read/Write Pending Queue (32 -entry, 11 KB) – Index 6 -bit (64 -entry Read/Write Queue) – Row Address 16 -bit • Read/Write Request Hit Queue (32 -entry, 3 KB) – Index 6 -bit • Total 14 KB 11

Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 12

Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 12

Simulation Environment • USIMM 1. 3 • Workloads – Single Process, Multi Process and

Simulation Environment • USIMM 1. 3 • Workloads – Single Process, Multi Process and Multi Thread • Configuration – 1, 2, 4, 8 and 16 core configuarion – 1 channel and 4 channel memory configuation 13

Result 14

Result 14

Result • Fairness – Improves from 3. 1% to 13. 6% (9. 1 on

Result • Fairness – Improves from 3. 1% to 13. 6% (9. 1 on average). – Squared Diviation of all thread slowdown is less than 2% (except the 16 -thread workload). • Performance – Overall execution time improves by 9. 7% • Power – EDP 5. 2% to 24. 6% improvement (17. 3% on average). 15

Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 16

Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 16

Conclution • Mornitor the ROB head and give the request from it higher priority

Conclution • Mornitor the ROB head and give the request from it higher priority when opening rows. • Group Hit requests to reduce latency. • Read by pass Write • Can improve thread fairness, performance and EDP. 17

THANK YOU Questions? 18

THANK YOU Questions? 18