ModelSpecific Registers A look at Intels scheme for

  • Slides: 32
Download presentation
Model-Specific Registers A look at Intel’s scheme for introducing new CPU features

Model-Specific Registers A look at Intel’s scheme for introducing new CPU features

Microprocessor evolution… 64 K-memory, 8 -bit registers (no mul/div, no FPU) 8080 1973 1

Microprocessor evolution… 64 K-memory, 8 -bit registers (no mul/div, no FPU) 8080 1973 1 M -memory, 16 -bit registers, I/O-ports (8087 option) 8086 1978 80186 1981 Ins/outs, shift/rotate-immediate, integrated-DMA+PIC+Timers 80286 1982 16 M-memory, protected-mode multitasking (80287 option) 80386 Added TR 6, TR 7 4 GB-memory, 32 -bit registers, paging (287/387 options) 1985 80486 Added TR 3, TR 4, TR 5 Integrated FPU, RISC, cacheing, xadd (APIC option) 1989 Removed TR 3, TR 4, TR 5, TR 6, TR 7 1993 80586 “Pentium” MMX-instructions, integrated local-APIC, MSRs, dual-pipelines, branch-prediction

The ‘Model-Specific’ concept • Beginning with the Pentium processor, Intel has been including ‘experimental’

The ‘Model-Specific’ concept • Beginning with the Pentium processor, Intel has been including ‘experimental’ features in its processors, warning that they may disappear from future designs, but providing a standard and permanent way for all such features to be accessed • This access is via a pair of ‘privileged’ instructions (rdmsr and wrmsr) that can only be executed by ‘ring 0’ code

Quite a few MSRs now! • At first there were only about a dozen

Quite a few MSRs now! • At first there were only about a dozen of these MSRs (Model-Specific Registers), but lately their number is well over 200 • Some MSRs have evidently proven to be sufficiently satisfactory and worth having that they are now deemed as permanent fixtures of the defined i 386 architecture

The Time-Stamp Counter • This 64 -bit Model-Specific Register was introduced in the Pentium

The Time-Stamp Counter • This 64 -bit Model-Specific Register was introduced in the Pentium processor and has been present in each CPU thereafter • It increments once every CPU clock-cycle, starting from 0 when power is turned on • It won’t overflow for at least ten years • Unprivileged programs (ring 3) normally can access, it via the rdtsc instruction

Using the TSC 64 -bits 63 32 31 EDX time 0: time 1: .

Using the TSC 64 -bits 63 32 31 EDX time 0: time 1: . quad 0 0 0 EAX # saves starting value from the TSC # saves concluding value from TSC # how you can measure CPU clock-cycles in a code-fragment rdtsc # read the Time-Stamp Counter movl %eax, time 0+0 # save least-significant longword movl %edx, time 0+4 # save most-significant longword # <Your code-fragment to be measured goes here> rdtsc # read the Time-Stamp Counter movl %eax, time 1+0 # save least-significant longword movl %edx, time 1+4 # save most-significant longword # now subtract starting-value ‘time 0’ from ending value ‘time 1’

The TSC as an MSR • Each Model-Specific Register has its own identifying register-number,

The TSC as an MSR • Each Model-Specific Register has its own identifying register-number, and it can be accessed (from ring 0) using the special pair of instructions: rdmsr and wrmsr • The Time-Stamp Counter is MSR number 0 x 10 • To write a new 64 -bit value into the TSC, you load the desired 64 -bit value into the EDX: EAX register-pair, you put the MSR ID-number 0 x 10 into register ECX, then you execute wrmsr

IA 32_APIC_BASE • This register has MSR number 0 x 1 B and is

IA 32_APIC_BASE • This register has MSR number 0 x 1 B and is private to each CPU in an SMP system • It establishes the base-address for the Local-APIC’s memory-mapped registers (the default base-address is 0 x. FEE 00000, but that can be changed using this MSR) • The CPU’s Local-APIC functions can be either enabled or disabled (via bit #11) • The BSP can be recognized (via bit #8)

Relocating the APIC registers IA 32_APIC_BASE (64 -bits) 63 32 31 reserved 12 11

Relocating the APIC registers IA 32_APIC_BASE (64 -bits) 63 32 31 reserved 12 11 APIC base-address (4 K page-number) E N 8 B S P Default-value for APIC base-address page = 0 x. FEE 00 Local-APIC Enable bit (1=enabled, 0=disabled) Boot-Strap Processor (read-only): 1=yes, 0=no # make the processor’s Local-APIC registers accessible in real-mode mov $0 x 000 D 8000, %eax # least-significant 32 -bits mov $0 x 0000, %edx # most-significant 32 -bits mov $0 x 1 B, %ecx # MSR register-number wrmsr # write to specified MSR 0

Extended Feature Enable Register • This Model-Specific Register (MSR) was introduced in the AMD

Extended Feature Enable Register • This Model-Specific Register (MSR) was introduced in the AMD 64 architecture and perpetuated by EM 64 T (for compatibility) 63 11 10 N L X M E A 8 L M E Legend: SCE = Sys. Call/sysret is Enabled (1=yes, 0=no) LME = Long-Mode is Enabled (1=yes, 0=no) LMA = Long-Mode is Active (1=yes, 0=no) NXE = Non-e. Xecutable pages Enabled (1=yes, 0=no) NOTE: The MSR address-index for EFER = 0 x. C 0000080, and this register is accessed using RDMSR or WRMSR instructions 0 S C E

The x 86 operating ‘modes’ Virtual 8086 mode 64 -bit mode Power on Real

The x 86 operating ‘modes’ Virtual 8086 mode 64 -bit mode Power on Real mode Protected mode System Management mode IA-32 e mode Compatibility mode

Why CPU’s ‘mode’ matters • Key differences among the x 86 modes: – How

Why CPU’s ‘mode’ matters • Key differences among the x 86 modes: – How memory is addressed and mapped – What instruction-set is available – Which registers are accessible – Which ‘exceptions’ may be generated – What data-structures are required – How task-switching can be accomplished – How interrupts will be processed

Mode transitions • The processor starts up in ‘real mode’ • Mode-transitions normally happen

Mode transitions • The processor starts up in ‘real mode’ • Mode-transitions normally happen under program control (except for transitions to the so-called ‘System Management Mode’) • Details of programming a mode-change depend on which modes are involved • Some mode-transfers aren’t possible • ‘ 64 -bit mode’ offers a lot of surprises

Registers in 64 -bit mode EAX ECX EDX EBX ESP EBP ESI EDI EIP

Registers in 64 -bit mode EAX ECX EDX EBX ESP EBP ESI EDI EIP EFLAGS RAX RCX RDX RBX RSP RBP RSI RDI RIP RFLAGS 63 CR 0 CR 2 CR 3 CR 4 DR 0 DR 1 DR 2 DR 3 DR 6 DR 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 CR 8 32 31 16 15 8 7 0 RAX EAX AX AL

Some missing features… • Memory-segmentation is “turned off” – Base-address is zero for CS,

Some missing features… • Memory-segmentation is “turned off” – Base-address is zero for CS, DS, ES, SS – Segment-limit checking is not performed • Certain familiar instructions no longer are defined while executing in ’ 64 -bit-mode’ – Cannot use ‘pusha’ and ‘popa’ – Cannot ‘ljmp’ or ‘lcall’ with ‘direct’ addressing – Cannot use ‘lahf’ and ‘sahf’

“canonical” addresses 0 x. FFFFFFFF … 0 x. FFFF 80000001 00010 00011 00100 00101

“canonical” addresses 0 x. FFFFFFFF … 0 x. FFFF 80000001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Analogy using 5 -bit values 64 -bit “vrtual” address space 0 x 00007 FFFFFF … 0 x 00000000 “canonical” addresses “non-canonical” (invalid) virtual addresses “canonical” addresses

4 -Levels of mapping 63 48 47 sign-extension 39 38 PML 4 30 29

4 -Levels of mapping 63 48 47 sign-extension 39 38 PML 4 30 29 PDPT PDIR 21 20 12 11 PTBL 0 offset 64 -bit ‘canonical’ virtual address Page Table Page Map Level-4 Table CR 3 Page Directory Pointer Table Page Frame (4 KB) Page Directory Each mapping-table contains up to 512 quadword-size entries

4 -level address-translation • The CPU examines any virtual address it encounters, subdividing it

4 -level address-translation • The CPU examines any virtual address it encounters, subdividing it into five fields 63 48 47 signextension 16 -bits 39 38 30 29 index into level 4 page-map directory table pointer table 9 -bits 21 20 12 11 index into pagedirectory index into page-table 9 -bits 0 offset into page-frame 12 -bits Any 48 -bit virtual-address is sign-extended to a 64 -bit “canonical” address Only “canonical” 64 -bit virtual-addresses are legal in 64 -bit mode

Format of 64 -bit table-entries Physical addresses on our current Core-2 CPUs are only

Format of 64 -bit table-entries Physical addresses on our current Core-2 CPUs are only 40 bits 63 62 E X B 52 51 avl 40 39 Page-frame physical base-address [39. . 32] Reserved (must be 0) 31 12 11 Page-frame physical base-address[31. . 12] 32 9 8 7 6 5 4 3 2 1 0 avl P P A C W UWP D T Meaning of these bits varies with the table Legend: P = Present (1=yes, 0=no) W = Writable (1=yes, 0=no) U = User-page (1=yes, 0=no) A = Accessed (1=yes, 0=no) PWT = Page Cache Disable (1=yes, 0=no) PWT = Page Write-Through (1=yes, 0=no) avl = available for user-defined purposes EXB = Execution-disabled Bit (if EFER. NXE=1)

RDMSR and WRMSR • An assembly language code-fragment to turn on the LME-bit (‘Long-Mode’

RDMSR and WRMSR • An assembly language code-fragment to turn on the LME-bit (‘Long-Mode’ Enable): # Each Model-Specific Register (MSR) is 64 -bits wide and has a unique # 32 -bit address-index which is first placed into register ECX. Then the # least-significant 32 -bits of that MSR is accessed using register EAX, # while the most-significant 32 -bits is accessed using register EDX. mov rdmsr bts wrmsr $0 x. C 0000080, %ecx $8, %eax # setup EFER address-index # read EFER into (EDX, EAX) # set the LME-bit’s image to 1 # write (EDX, EAX) into EFER # NOTE: RDMSR and WRMSR must be executed at ‘Ring 0’ privilege-level.

Control Registers CR 4 and CR 0 13 31 V M 0 0 0

Control Registers CR 4 and CR 0 13 31 V M 0 0 0 0 0 X 0 0 E 5 O S X M M E x O S F X C R P P M C G C E E E P A E 0 P T P V D S S V M E E D I E Control Register CR 4 0 31 P G C N A 0 0 0 0 0 D W M 0 W N E T E M 0 0 0 0 0 P E T S M P Control Register CR 0 Legend (for 64 -bit mode): PE = Protected-mode Enabled (1=yes, 0=no) PG = Paging Enabled (1=yes, 0=no) PAE = Page-Addressing Extensions (1=enabled, 0=disabled) P E

Segment-Descriptor Format 64 -bit code-segment (‘LONG’ mode) 63 Base[31. . 24] (if L=0) A

Segment-Descriptor Format 64 -bit code-segment (‘LONG’ mode) 63 Base[31. . 24] (if L=0) A Limit D CR G D L V [19. . 16] P P S X / / A L (if L=0) L DW Base[15. . 0] (if L=0) 32 Base[23. . 16] (if L=0) Limit[15. . 0] (if L=0) 31 0 Legend: DPL = Descriptor Privilege Level (0. . 3) G = Granularity (0 = byte, 1 = 4 KB-page) P = Present (0 = no, 1 = yes) D = Default size (0 = 16 -bit, 1 = 32 -bit) S = System (0 = yes, 1 = no) X = e. Xecutable (0 = no, 1 = yes) A = Accessed (0 = no, 1 = yes) code-segments: R = Readable (0 = no, 1 = yes) C = Conforming (0=no, 1=yes) data-segments: W = Writable (0 = no, 1 = yes) D = expands-Down (0=no, 1=yes) L = Long-mode (i. e. , 64 -bit addressing) (0=no, 1=yes) AVL = Available for user’s purposes

IA-32 e Call-Gate descriptor 127 96 Reserved (must be 0) offset[63. . 32] A

IA-32 e Call-Gate descriptor 127 96 Reserved (must be 0) offset[63. . 32] A Base[31. . 24] offset[31. . 16] GDL V (if S=0) L code-segment selector 31 D Gate CR P P 0 X Type / / L (=1100) DW Reserved (must be 0) offset[15. . 0] 0 We can use a call-gate to ‘jump’ from 16 -bit code-segment to a 64 -bit code-segment

Summary of steps • Transition from real-mode to IA-32 e mode: – Build the

Summary of steps • Transition from real-mode to IA-32 e mode: – Build the table of global descriptors – Load GDTR with pseudo-descriptor for GDT – Build the 4 -level page-mapping tables – Enable IA-32 e mode (set EFER. LME=1) – Enable Page-Address Extensions (CR 4. PAE) – Load Level 4 page-map table address in CR 3 – Activate IA-32 e mode (CR 0. PE and CR 0. PG) – Transfer via call-gate to 64 -bit code-segment

Notes on the transition • Code-segment must be “identity-mapped” • Interrupts have to be

Notes on the transition • Code-segment must be “identity-mapped” • Interrupts have to be temporarily disabled • All memory-addressing in 64 -bit mode via CS, SS, DS or ES uses 0 as base-address (and checking of segment-limits is omitted)

For a return to ‘real-mode’ • Processor must enter 16 -bit code-segment in ‘compatibility-mode’

For a return to ‘real-mode’ • Processor must enter 16 -bit code-segment in ‘compatibility-mode’ via indirect far jump – Load segment-registers DS, ES, and SS with ‘writable’ 16 -bit segment-selectors (64 K-limit) – Code-segment has to be “identity-mapped” – Deactivate IA-32 e mode by clearing PG-bit – Leave ‘protected-mode’ by clearing PE-bit – Reload registers CS and SS with real-mode segment-addresses before enabling interrupts

In-class exercise #1 • Try running our ‘trymoves. s’ demo, to see the effect

In-class exercise #1 • Try running our ‘trymoves. s’ demo, to see the effect of changing the bottom-half of a 64 -bit register • Then modify the instructions in this demo so that you use as many of the new CPU registers as possible (i. e. , use R 8, …, R 15 instead of RAX, RBX, etc. , and R 8 L, R 9 L, …, instead of AL, BL, etc. )

Demo-program: ‘try 64 bit. s’ • We created a demo-program that starts in ‘real-mode’,

Demo-program: ‘try 64 bit. s’ • We created a demo-program that starts in ‘real-mode’, enters 64 -bit mode and draws a message, jumps to ‘compatibility mode’ and draws another message, then returns to real-mode and shows a final message • It has to write directly to VRAM when it’s not executing in real-mode – because the ROM-BIOS routines use ‘real’-style code

How text-mode VRAM works • The video memory resides at 0 x 000 B

How text-mode VRAM works • The video memory resides at 0 x 000 B 8000 and in text-mode it is organized as a linear array of two-byte elements (i. e. , ‘words’): 15 8 7 Attribute-code for the foreground and background colors 0 Ascii code for character • Array-elements are arranged in “rowmajor” order (left-to-right, top-to-bottom)

Default color-programming 7 Blinking 0 6 Red 0 5 4 3 Green 0 Blue

Default color-programming 7 Blinking 0 6 Red 0 5 4 3 Green 0 Blue 1 Intense 1 BACKCOLOR 2 Red 1 1 Green 1 FORECOLOR 0 Blue 1

Character-cell screen-locations 80 cells-per-row 25 rows for (row 0, column 0) the address-offset is

Character-cell screen-locations 80 cells-per-row 25 rows for (row 0, column 0) the address-offset is (0*80+0)*2 for (row 2, column 79) the address-offset is (2*80+79)*2 for (row 24, column 40) the address-offset is (24*80+40)*2

In-class exercise #2 • Can you modify the message-colors used in our ‘try 64

In-class exercise #2 • Can you modify the message-colors used in our ‘try 64 bit. s’ demo-program so that: – the first message is bright-red against white – the second message is brown against cyan – The final message is magenta against black