Ringtransitions for EM 64 T How the CPU

  • Slides: 26
Download presentation
Ring-transitions for EM 64 T How the CPU can accomplish transitions among its differing

Ring-transitions for EM 64 T How the CPU can accomplish transitions among its differing privilege-levels in 64 -bit mode

Rationale • The usefulness of protected-mode derives from its ability to enforce restrictions upon

Rationale • The usefulness of protected-mode derives from its ability to enforce restrictions upon software’s freedom to take certain actions • Four distinct privilege-levels are supported • Organizing concept is “concentric rings” • Innermost ring has greatest privileges, and privileges diminish as rings move outward

Four Privilege Rings Ring 3 Least-trusted level Ring 2 Ring 1 Ring 0 Most-trusted

Four Privilege Rings Ring 3 Least-trusted level Ring 2 Ring 1 Ring 0 Most-trusted level

Suggested purposes Ring 0: operating system kernel Ring 1: operating system services Ring 2:

Suggested purposes Ring 0: operating system kernel Ring 1: operating system services Ring 2: custom extensions Ring 3: ordinary user applications

Unix/Linux and Windows Ring 0: operating system Ring 1: unused Ring 2: unused Ring

Unix/Linux and Windows Ring 0: operating system Ring 1: unused Ring 2: unused Ring 3: application programs

Legal Ring-Transitions • A transition from an outer ring to an inner ring is

Legal Ring-Transitions • A transition from an outer ring to an inner ring is made possible by using a special control-structure (known as a ‘call gate’) • The ‘gate’ is defined via a data-structure located in a ‘system’ memory-segment normally not accessible for modifications • A transition from an inner ring to an outer ring is not nearly so strictly controlled

Data-sharing • Function-calls typically require that two separate routines share some data-values (e. g.

Data-sharing • Function-calls typically require that two separate routines share some data-values (e. g. , parameter-values get passed from the calling routine to the called routine) • To support reentrancy and recursion, the processor’s stack-segment is frequently used as a ‘shared-access’ storage-area • But among routines with different levels of privilege this could create a “security hole”

An example senario • Say a procedure that executes in ring 3 calls a

An example senario • Say a procedure that executes in ring 3 calls a procedure that executes in ring 2 • The ring 2 procedure uses a portion of its stack-area to create ‘automatic’ variables that it uses for temporary workspace • Upon return, the ring 3 procedure would be able to examine whatever values are left behind in this ring 2 workspace

Data Isolation • To guard against unintentional sharing of privileged information, different stacks are

Data Isolation • To guard against unintentional sharing of privileged information, different stacks are provided at each distinct privilege-level • Accordingly, any transition from one ring to another must necessarily be accompanied by an mandatory ‘stack-switch’ operation • The CPU provides for automatic switching of stacks and copying of parameter-values

Inward ring-transitions • Transfers from a ring with lesser privileges to a ring with

Inward ring-transitions • Transfers from a ring with lesser privileges to a ring with greater privileges (e. g. , from ring 3 to ring 0) are controlled by a system data-structure known as a ‘call gate’ and normally would be accomplished using an ‘lcall’ instruction (i. e. , a ‘long’ call), either “direct” (the target is specified by data in the instruction) or “indirect” (the target is specified by data at a memory-location)

requires ‘gate’ and ‘TSS’ ‘gate’ structure’s DPL determines whether the inward transition is permitted,

requires ‘gate’ and ‘TSS’ ‘gate’ structure’s DPL determines whether the inward transition is permitted, and if it is, what will be the new CS and RIP register-values lcall instruction ‘TSS’ structure determines what the new SS and RSP register-values will be, and thus where the old values from SS, RSP, CS, and RIP will get saved for a later ‘return’ from the ‘call’ ring 3 ring 2 ring 1 ring 0

64 -bit Call-Gate Descriptors 127 96 reserved (must be 0) offset[ 63. . 32

64 -bit Call-Gate Descriptors 127 96 reserved (must be 0) offset[ 63. . 32 ] offset[ 31. . 16 ] code-selector D P P L 0 gate type reserved (must be 0) offset[ 15. . 0 ] 31 Legend: P=present (1=yes, 0=no) DPL=Descriptor Prvilege Level (0, 1, 2, 3) code-selector (specifies memory-segment containing procedure code) offset (specifies the procedure’s entry-point within its code-segment) gate-type: (‘ 0 x. C’ signifies a 64 -bit call-gate when EFER. LMA=1) 0

64 -bit Task-State Segment reserved I/O MAP BASE reserved IST 7 IST 6 IST

64 -bit Task-State Segment reserved I/O MAP BASE reserved IST 7 IST 6 IST 5 IST 4 IST 3 IST 2 IST 1 reserved ESP 2 ESP 1 ESP 0 reserved 32 -bits Reserved bits )must be set to zero) 100 92 84 76 68 60 52 44 36 28 20 12 4 0

How CPU finds the TSS Task State Segment 64 -bit Task-State Segment-Descriptor Task Register

How CPU finds the TSS Task State Segment 64 -bit Task-State Segment-Descriptor Task Register GDTR TR Global Descriptor Table

Outward ring-transitions • Transfers from a ring with greater privilege to a ring having

Outward ring-transitions • Transfers from a ring with greater privilege to a ring having lesser privilege (e. g. , from ring 0 to ring 3) are normally accomplished by using an ‘lret’ instruction (i. e. , a ‘long’ return) and refer to values on the current stack to specify the changes in contents for registers CS and RIP (for the code transfer) and registers SS and RSP (for the mandatory stack-switch)

‘returns’ are less restrictive lret instruction ring 3 ring 2 ring 1 The new

‘returns’ are less restrictive lret instruction ring 3 ring 2 ring 1 The new values for the CS, RIP, SS, and RSP registers will be taken from the current stack ring 0 64 -bits SS RSP CS RIP ring 0 stack SS: RIP

64 -bit memory-addressing • Recall that memory-addressing in 64 -bit mode uses a ‘flat’

64 -bit memory-addressing • Recall that memory-addressing in 64 -bit mode uses a ‘flat’ address-space (i. e. , no segmentation: all addresses are offsets from zero, and no limit-checking is done) • However, page-mapping is in effect, using the 4 -level page-table scheme • At least one page needs to be “identitymapped” for the activation or deactivation of the processor’s ‘long’ mode (IA-32 e)

New ‘page-mapping’ idea • We can simplify our program addressing in 64 -bit mode

New ‘page-mapping’ idea • We can simplify our program addressing in 64 -bit mode with a ‘non-identity’ mapping: 0 x. B 8000 vram 0 x. B 8000 our demo code and data appears twice in virtual space load-address = 0 x 10000 demo 0 x 20000 demo physical address-space 0 x 10000 0 x 00000 virtual address-space

How we build the map-tables level 1: level 2: level 3: level 4: .

How we build the map-tables level 1: level 2: level 3: level 4: . section. data. align 0 x 1000 entry = 0 x 10000. rept 16. quad entry + 7 entry = entry + 0 x 1000. endr entry = 0 x 10000. rept 240 entry = entry + 0 x 1000. quad entry + 7. endr. align 0 x 1000. quad level 1 + 0 x 10000 + 7. align 0 x 1000. quad level 2 + 0 x 10000 + 7. align 0 x 1000. quad level 3 + 0 x 10000 + 7. align 0 x 1000

Our ‘tryring 3. s’ demo IA-32 e mode begin mov %cr 0 16 -bit

Our ‘tryring 3. s’ demo IA-32 e mode begin mov %cr 0 16 -bit ‘compatibility’ mode CPL=0 x 86 ‘real-mode’ mov %cr 0 exit direct LJMP indirect LJMP LRETQ 64 -bit mode CPL=0 64 -bit mode CPL=3 indirect LCALL thru callgate

From 16 -bit to 64 -bit • Once we arrive in IA-32 e mode,

From 16 -bit to 64 -bit • Once we arrive in IA-32 e mode, our first transfer is from 16 -bit ‘compatibility’ mode to 64 -bit ‘long’ mode • For this we can use a long direct jump with a default (i. e. , 16 -bit) operand-size (thanks to our special page-mapping) ljmp $sel_CS 0, $prog 64 selector for 64 -bit code-segment at privilege-level zero (i. e. , ring 0) 16 -bit offset to our ‘prog 64’ label

From 64 -bit ring 0 to 64 -bit ring 3 • Once we arrive

From 64 -bit ring 0 to 64 -bit ring 3 • Once we arrive in 64 -bit mode, our second transfer is from ring 0 to ring 3 • For this we use a ‘long return’ instruction after setting up our current ring 0 stack with the new values for SS, RSP, CS, and RIP, and we specify a quadword operand-size pushq lretq $sel_SS 3 $tos 3 $sel_CS 3 $showmsg selector for writable data-segment at privilege-level three (i. e. , ring 3) selector for 64 -bit code-segment at privilege-level three (i. e. , ring 3)

From 64 -bit ring 3 to 64 -bit ring 0 • When we’ve finished

From 64 -bit ring 3 to 64 -bit ring 0 • When we’ve finished our ring 3 procedure, we get back to ring 0 by using an indirect ‘long call’ through a 64 -bit call-gate • Our 64 -bit TSS must be set up in advance for the accompanying stack-switch lcall *supervisor: . long 0, sel_ret selector for 64 -bit call-gate descriptor accessible at ring 3, specifying new register-values for CS and RIP (the new RSP-value comes from the TSS, and the new SS-value will be NULL) # indirect long-call # target of the call a “dummy” operand, required by the syntax for ‘lcall’, but not used by the CPU (since all the needed info is in the call-gate and the Task-State Segment)

From 64 -bit code to 16 -bit code • Finally, for returning to ‘real-mode’,

From 64 -bit code to 16 -bit code • Finally, for returning to ‘real-mode’, we need to transfer from the 64 -bit code in ‘long mode’ to 16 -bit code in ‘compatibility mode’ (both ring 0, so no privilege-change) • For this we use an indirect long jump (as there’s no direct long jump in 64 -bit mode) ljmp *departure: . long prog 16, sel_cs 0 32 -bit offset to our ‘prog 16’ label # indirect long-jump # target of this jump selector for 16 -bit code-segment at privilege-level zero (i. e. , ring 0)

In-class exercise #1 • Try some alternative transfer-instructions: – Use LJMPL $sel_C 0, $prog

In-class exercise #1 • Try some alternative transfer-instructions: – Use LJMPL $sel_C 0, $prog 64 + 0 x 10000 in place of LJMP $sel_C 0, $prog 64 – Use LJMP *supervisor in place of LCALL *supervisor – Use LRETW in place of LRETQ after setting up your stack with word-size values instead of quadword-sized values

In-class exercises #2 and #3 • Can you adjust all the virtual addresses in

In-class exercises #2 and #3 • Can you adjust all the virtual addresses in your 64 -bit code so that an ‘identity-map’ could be used in this ‘tryring 3. s’ demo? • Could you adjust all the privilege-levels so that ‘ring 2’ gets used instead of ‘ring 3’?