Deferred segmentloading An exercise on implementing the concept

The ‘do-it-later’ philosophy • Modern operating systems often follow a policy of deferring work

Avoiding wasted effort • Thus it will be more efficient if an OS does

Another example • In a multitasking environment, many tasks are taking turns at executing

The NPX registers • Only a few tasks typically make any use of the

Example: effect of TS=1 • Each time the CPU performs a task-switch it automatically

The fault-7 exception-handler • The work involved in saving the contents of the floating-point

The ‘fork()’ system-call • In a UNIX/Linux operating system, the way any new task

The ‘fork-and-exec’ senario • In practice, the most common reason for a program to

‘loading-on-demand’ • An OS can avoid all the wasted effort of duplicating a parent-task’s

How it works • Segments remain ‘uninitialized’ until they are actually accessed by an

An ‘error-code’ is pushed • Besides pushing the memory-address of the faulting instruction onto

Error-Code Format 31 15 reserved 3 table-index 2 1 0 T I I E

Our ‘simulation’ demo • We can illustrate the ‘just-in-time’ idea by writing a program

Our ‘fault-handler’ • Our Interrupt-Service-Routine for fault-11 will do two things (after a “sanity

Where is the ‘error-code’? 16 -bits SS: SP FLAGS +6 CS +4 IP +2

Code using ‘enter’ and ‘leave’ isr. NPF: # Our fault-handler for exception-0 x 0

What does ‘enter’ do? • The effect of the single instruction enter $0, $0

How the stack is changed 16 -bits SS: SP 16 -bits FLAGS +6 FLAGS

What does ‘leave’ do? • The effect of the single instruction leave is equivalent

How the stack is changed 16 -bits FLAGS +8 CS +6 IP error-code old-BP

Our demo’s memory-layout ARENA #3 (not used by this demo) 0 x 00030000 Copy

Efficient memory copying • We use the x 86 CPU’s ‘rep movsw’ instruction to

Example assembly code cld ; use ‘forward’ string-copying mov xor $sel_ds, %si, %ds %si,

Segment-Descriptor Format 47 63 Base[31. . 24] 32 RA D CR Limit GDS V

In-class exercise • To get some practical ‘hands on’ experience with implementing the demand-loading

Slides: 26

Download presentation

Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

The ‘do-it-later’ philosophy • Modern operating systems often follow a policy of deferring work whenever possible • The advantage of adopting this practice is most evident in those cases where it turns out that the work was not needed after all • Example: Many programs contain lots of code and data for diagnosing errors – but it’s not needed if no errors actually occur

Avoiding wasted effort • Thus it will be more efficient if an OS does not always take time to load those portions of a program (such as its error-diagnostics and error-recovery routines) which may be unnecessary in the majority of situations • But of course the OS needs to be ready to take a ‘timeout’ for loading those routines when and if the need becomes apparent

Another example • In a multitasking environment, many tasks are taking turns at executing instructions • The CPU typically performs task-switching several times every second – and must do a ‘save’ of the outgoing task’s context, and a ‘load’ of the incoming task’s context, any time it switches from one task to the next • We ask: can any of this work be deferred?

The NPX registers • Only a few tasks typically make any use of the Pentium’s ‘floating-point’ registers, so it’s wasteful to do a ‘save-and-reload’ for these registers with every task-switch • The TS-bit (bit #3 in Control Register 0) is designed to assist an OS in implementing a policy of ‘lazy’ context-switching for the set of registers used in floating-point work

Example: effect of TS=1 • Each time the CPU performs a task-switch it automatically sets the TS-bit to 1 (only an OS can execute a ‘clts’ to reset TS=0) • When any task tries to execute any of the NPX instructions (to do some arithmetic with values in the floating-point registers), an exception 7 fault will occur if the TS-bit hasn’t been cleared since a task-switch

The fault-7 exception-handler • The work involved in saving the contents of the floating-point registers being used by a no-longer-active task, and reloading those registers with values that the active task expects to work on, can be delegated to the fault-handler for exception-7 • It can clear the TS-bit (with ‘clts’) and then ‘retry’ the instruction that caused this ‘fault’

The ‘fork()’ system-call • In a UNIX/Linux operating system, the way any new task get created is by a call to the kernel’s ‘fork()’ service-function • This function is supposed to ‘duplicate’ the entire program-environment of the calling task (i. e. , code, data, stack and heap, plus the kernel’s process-control data-structure • But much of this work is often wasted!

The ‘fork-and-exec’ senario • In practice, the most common reason for a program to ‘fork()’ a child-process is so the child-task can launch a separate program: if ( fork() == 0 ) execl( “newprog”, newargs, 0 ); • In these cases the ‘duplicated’ code, data, and heap are not relevant to the new task - and so they will simply get discarded!

‘loading-on-demand’ • An OS can avoid all the wasted effort of duplicating a parent-task’s resources (its code, data, heap, etc. ) by implementing “only upon demand” loading as a policy • For an OS that uses the CPU’s memorysegmentation capabilities, an ‘on demand’ policy can be implemented by using the x 86’s ‘Segment-Not-Present’ exception

How it works • Segments remain ‘uninitialized’ until they are actually accessed by an application • Segment-descriptors are initially marked as ‘Not Present’ (i. e. , their P-bit is zero) • When any instruction attempts to access such a memory-segment (read, write, or fetch), the CPU responds by generating exception-11: “Segment-Not-Present”

An ‘error-code’ is pushed • Besides pushing the memory-address of the faulting instruction onto the exceptionhandler’s stack, the CPU also pushes an ‘error-code’ to indicate which descriptor was not yet marked as being ‘Present’ • The handler can then ‘load’ that segment with the proper information and adjust its descriptor’s P-bit, then retry the instruction

Error-Code Format 31 15 reserved 3 table-index 2 1 0 T I I E D X T T Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0 x 0 B, 0 x 0 C, and 0 x 0 D

Our ‘simulation’ demo • We can illustrate the ‘just-in-time’ idea by writing a program that performs a ‘far’ call to an ‘uninitialized’ region of memory: lcall $sel_CS, $draw_message • The code-segment descriptor (referenced here by the selector-value ‘sel_CS’) will be initially marked ‘Not-Present’ (so this ‘lcall’ instruction will trigger an exception-11)

Our ‘fault-handler’ • Our Interrupt-Service-Routine for fault-11 will do two things (after a “sanity check”): • Initialize the memory-region with code and data • Mark the code-segment’s descriptor as ‘Present’ • It will carefully preserve the CPU registers, so that it can ‘retry’ the faulting instruction • The “sanity check” verifies that error-code bits 0, 1, 2 (EXT, IDT and TI) are not set

Where is the ‘error-code’? 16 -bits SS: SP FLAGS +6 CS +4 IP +2 error-code +0 Layout of our fault-handler’s stack (because we used a 286 interrupt-gate) The Pentium provides a special pair of instructions that procedures can use to address any parameter-values that reside on its stack: ‘enter’ and ‘leave’

Code using ‘enter’ and ‘leave’ isr. NPF: # Our fault-handler for exception-0 x 0 B enter $0, $0 # setup stackframe access testw jnz $0 x 0007, 2(%bp) back_to_main # unexpected error-code? # yes, we can’t handle it call initialize_the_high_arena # copies our code and data mark_segment_as_ready # marks descriptor’s ‘P’-bit leave add iret $2, %sp # discard the frame access # discard the error-code # ‘retry’ the faulting instruction

What does ‘enter’ do? • The effect of the single instruction enter $0, $0 is equivalent to this instruction-sequence: push mov %bp %sp, %bp

How the stack is changed 16 -bits SS: SP 16 -bits FLAGS +6 FLAGS +8 CS +4 CS +6 IP +2 IP +4 error-code +0 Layout of our fault-handler’s stack BEFORE executing ‘enter’ SS: SP old-BP +2 SS: BP Layout of our fault-handler’s stack AFTER executing ‘enter’ NOTE: Any memory-references that use indirect addressing via register BP will use segment-register SS by default (not the usual DS segment-register) for example: testw $0 x 0007, 2(%bp)

What does ‘leave’ do? • The effect of the single instruction leave is equivalent to this instruction-sequence: mov pop %bp, %sp %bp

How the stack is changed 16 -bits FLAGS +8 CS +6 IP error-code old-BP … SS: SP +4 +2 SS: BP SS: SP FLAGS +6 CS +4 IP +2 error-code +0 Layout of our fault-handler’s stack AFTER executing ‘leave’ other pushed words Layout of our fault-handler’s stack BEFORE executing ‘leave’ So the effect of ‘leave’ is to undo the effect of ‘enter’

Our demo’s memory-layout ARENA #3 (not used by this demo) 0 x 00030000 Copy contents of ARENA #1 to ARENA #2 (where our demo expects drawing code will reside) ARENA #1 (where the loader puts our program code and data) BOOT_LOCN 0 x 00020000 0 x 00010000 0 x 00007 C 00 0 x 0000

Efficient memory copying • We use the x 86 CPU’s ‘rep movsw’ instruction to perform memory-to-memory copying operations • The segment-selector for the segment we copy from (it must be ‘readable’) goes into registers DS, and the segment-selector for the segment we copy to (it must be ‘writable’) goes into ES • The number of words we will copy should match the size of our code-segment (which is 64 KB) • The Direction-Flag should be cleared (DF=0)

Example assembly code cld ; use ‘forward’ string-copying mov xor $sel_ds, %si, %ds %si, %si ; selector for arena at 0 x 10000 ; goes in segment-register DS ; start copying from offset zero mov xor $sel_DS, %di, %es %di, %di ; selector for arena at 0 x 20000 ; goes in segment-register ES ; start copying to offset zero mov rep $0 x 8000, %cx movsw ; number of words to be copied ; perform the arena-copying

Segment-Descriptor Format 47 63 Base[31. . 24] 32 RA D CR Limit GDS V P P SX / / A [19. . 16] VL L DW Base[15. . 0] Base[23. . 16] Limit[15. . 0] 0 31 The segment-descriptor’s ‘Present’ bit is bit-number 47

In-class exercise • To get some practical ‘hands on’ experience with implementing the demand-loading concept we suggest the following exercise: Modify our ‘notready. s’ demo so that it uses a 32 -bit Interrupt-Gate for its Segment-Not-Present entry in the Interrupt Descriptor Table (this will affect the layout of the fault-handler’s stack) • You may need to abandon use of the ‘enter’ and ‘leave’ instructions unless you also use a 32 -bit data-segment descriptor for your stack-segment