ThreadFair Memory Request Reordering Kun Fang Nick Iliev
- Slides: 18
Thread-Fair Memory Request Reordering Kun Fang, Nick Iliev, Ehsan Noohi, Suyu Zhang, and Zhichun Zhu Dept. of ECE, University of Illinois at Chicago 1
Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 2
SDRAM Organization (Device) SDRAM device Bank 0 Bank 7 Row decoder … Column decoder addr Row buffer cmd COL ACT PRE Row buffer data 8 bits 3
Request Reordering R 0 B 0 Rd R 0 B 0 Wr R 1 R 0 B 0 Rd R 0 Act R 0 Col. Rd Col Rd Pre R 0 Col Bus turn around Act R 1 Pre R 0 Col Wr Col Act R 1 Pre R 1 T_WTR Act R 0 Col Pre R 1 Col Rd Bus turn around Col Wr Pre R 0 Col. Rd Pre R 0 4
Outline • Background • Thread-Fair Memory Request Reordering – Observation • Result • Conclusion 5
Blocked By ROB Head ROB Mem Controller B 0 R 1 B 0 R 0 B 0 R 0 B 0 R 0 B 0 R 1 R 0 No Reordering Reorder 6
Outline • Background • Thread-Fair Memory Request Reordering – Algorithm • Related Work • Conclusion 7
Thread-Fair Memory Request Reordering Read First WQ > High WM N Issue Rd Hit? Y Y Y Issue Wr Hit? N N Issue ROB Head? Y N Write First N Issue Rd FCFS? Y WQ < Low WM Y Issue Rd FCFS? N N Y Issue Rd Hit Issue Wr Hit 8
Outline • Background • Thread-Fair Memory Request Reordering – Implementation • Result • Conclusion 9
Scheduler Design Read Pending Queue Index 1 Row Addr Index Row Addr 4 6 5 Write Pending Queue Read Queue Write Queue 1 6 Read Row Hit Queue Write Row Hit Queue Bank n 10
Hardware Overhead • 4 Channel, 2 Ranks/Channel, 8 Banks/Rank – 64 Banks • Read/Write Pending Queue (32 -entry, 11 KB) – Index 6 -bit (64 -entry Read/Write Queue) – Row Address 16 -bit • Read/Write Request Hit Queue (32 -entry, 3 KB) – Index 6 -bit • Total 14 KB 11
Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 12
Simulation Environment • USIMM 1. 3 • Workloads – Single Process, Multi Process and Multi Thread • Configuration – 1, 2, 4, 8 and 16 core configuarion – 1 channel and 4 channel memory configuation 13
Result 14
Result • Fairness – Improves from 3. 1% to 13. 6% (9. 1 on average). – Squared Diviation of all thread slowdown is less than 2% (except the 16 -thread workload). • Performance – Overall execution time improves by 9. 7% • Power – EDP 5. 2% to 24. 6% improvement (17. 3% on average). 15
Outline • • Background Thread-Fair Memory Request Reordering Result Conclusion 16
Conclution • Mornitor the ROB head and give the request from it higher priority when opening rows. • Group Hit requests to reduce latency. • Read by pass Write • Can improve thread fairness, performance and EDP. 17
THANK YOU Questions? 18
- Rail fixtures
- Wu tou tang
- Fangland clothing
- Luyuan fang
- Fang din by
- W fang
- Wenbin fang
- Steven fang
- Criterios de fang
- Fang luo
- Steven fang
- Fang enginer
- Cooperative business definition
- Trojský kůň rčení
- Franz marc modrý kůň
- Sakurasou no pet na kanojo characters
- Vi har kun en sol tekst
- Emma kadler
- Kegiatan persuratan islam kerajaan alam melayu