CPU microarchitectures 2000 2018 NPRG 054 High Performance

![Intel Netburst Microarchitecture [2000] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 2 Intel Netburst Microarchitecture [2000] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 2](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-2.jpg)
![Intel Net. Burst Microarchitecture [2000] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek Intel Net. Burst Microarchitecture [2000] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-3.jpg)
![Intel Core Microarchitecture Pipeline [2006] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek Intel Core Microarchitecture Pipeline [2006] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-4.jpg)


![Intel Nehalem Pipeline [2008] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 7 Intel Nehalem Pipeline [2008] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 7](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-7.jpg)
![Intel Sandy Bridge Pipeline [2011] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek Intel Sandy Bridge Pipeline [2011] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-8.jpg)




![Intel Skylake (2015) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek Intel Skylake (2015) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-13.jpg)
![Intel Skylake (2015) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek Intel Skylake (2015) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-14.jpg)
![AMD Zen+ (2018) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek AMD Zen+ (2018) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-15.jpg)
![AMD Zen+ (2018) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek AMD Zen+ (2018) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek](https://slidetodoc.com/presentation_image_h2/a162a58f1b2f337f9910f84c97445895/image-16.jpg)
- Slides: 16
CPU microarchitectures (2000 -2018) NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 1
Intel Netburst Microarchitecture [2000] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 2
Intel Net. Burst Microarchitecture [2000] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 3
Intel Core Microarchitecture Pipeline [2006] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 4
Intel Core Microarchitecture � In a cycle, CPU can (in theory) simultaneously perform: Fetch: 16 B (cca. 4 instrukce) from L 1 instruction cache � Decode: 1 to 5 instructions � ALU: 3 simple operations (add/mul) � Memory load: 1 read (up to 128 bits) from L 1 data cache � Memory store: 1 write (up to 128 bits) to L 1 data cache � � Latency § the time between consuming operands and producing results integer add: 1, mul: 3 -5 � FP add: 3, FP mul: 4 -5 � div: data dependent � integer load: 3, FP load: 4 (L 1 cache) � store address: 3 � store data: 2 (retirement, in-order) � NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 5
Intel Core Microarchitecture � Branch prediction conditions, indirect branches, call/return pairs � speculative execution � � Instruction decoder loop cache (simple loops up to 18 instructions) � conversion to micro-ops (1: 1, 1: N, 2: 1) � stack-pointer simulator � � Renamer � 16 architectural integer registers mapped to 144 physical § similarly for FP registers � Out-of-order execution 32 micro-ops in simultaneous execution (RS) from a window of 96 (ROB) � retirement: memory/register stores in-order in background � store forwarding: loads retrieve values from waiting stores � speculative loads: no waiting for waiting stores (to unknown addresses) � NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 6
Intel Nehalem Pipeline [2008] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 7
Intel Sandy Bridge Pipeline [2011] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 8
Intel vs. AMD architectures (realworldtech. com) NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 9
Intel Haswell Microarchitecture (2013) NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 10
Haswell (2013) vs. Sandy Bridge (2011) NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 11
Haswell (2013) vs. Sandy Bridge (2011) NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 12
Intel Skylake (2015) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 13
Intel Skylake (2015) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 14
AMD Zen+ (2018) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 15
AMD Zen+ (2018) [wikichip. org] NPRG 054 High Performance Software Development- 2016/2017 David Bednárek 16