Interrupts and Exceptions COMS W 6998 Spring 2010

  • Slides: 59
Download presentation
Interrupts and Exceptions COMS W 6998 Spring 2010

Interrupts and Exceptions COMS W 6998 Spring 2010

Overview l l The Hardware Part l Interrupts and Exceptions l Exception Types and

Overview l l The Hardware Part l Interrupts and Exceptions l Exception Types and Handling l Interrupt Request Lines (IRQs) l Programmable Interrupt Controllers (PIC) l Interrupt Descriptor Table (IDT) l Hardware Dispatching of Interrupts The Software Part l Nested Execution l Kernel Stacks l Soft. IRQs, Tasklets l Work Queues l Threaded Interrupts

Simplified Architecture Diagram Central Processing Unit Main Memory system bus I/O device

Simplified Architecture Diagram Central Processing Unit Main Memory system bus I/O device

Motivation l l Utility of a general-purpose computer depends on its ability to interact

Motivation l l Utility of a general-purpose computer depends on its ability to interact with I/O devices attached to it (e. g. , keyboard, display, disk-drives, network, etc. ) Devices require a prompt response from the CPU when various events occur, even when the CPU is busy running a program Need a mechanism for a device to “gain CPU’s attention” Interrupts provide a way doing this

CPU’s ‘fetch-execute’ cycle User Program Fetch instruction at IP ld add IP st Save

CPU’s ‘fetch-execute’ cycle User Program Fetch instruction at IP ld add IP st Save context Decode the fetched instruction mul ld Execute the decoded instruction sub Lookup ISR bne add Get INTR ID Advance IP to next instruction Execute ISR jmp … Interrupt? no yes IRET

Interrupts l l Forcibly change normal flow of control Similar to context switch (but

Interrupts l l Forcibly change normal flow of control Similar to context switch (but lighter weight) l l Hardware saves some context on stack; Includes interrupted instruction if restart needed Enters kernel at a specific point; kernel then figures out which interrupt handler should run Execution resumes with special “iret” instruction Many different types of interrupts

Types of Interrupts l Asynchronous l l l From external source, such as I/O

Types of Interrupts l Asynchronous l l l From external source, such as I/O device Not related to instruction being executed Synchronous (also called exceptions) l l Processor-detected exceptions: l Faults — correctable; offending instruction is retried l Traps — often for debugging; instruction is not retried l Aborts — major error (hardware failure) Programmed exceptions: l Requests for kernel intervention (software intr/syscalls)

Faults l l Instruction would be illegal to execute Examples: l l l Writing

Faults l l Instruction would be illegal to execute Examples: l l l Writing to a memory segment marked ‘read-only’ Reading from an unavailable memory segment (on disk) Executing a ‘privileged’ instruction Detected before incrementing the IP The causes of ‘faults’ can often be ‘fixed’ If a ‘problem’ can be remedied, then the CPU can just resume its execution-cycle

Traps l l l A CPU might have been programmed to automatically switch control

Traps l l l A CPU might have been programmed to automatically switch control to a ‘debugger’ program after it has executed an instruction That type of situation is known as a ‘trap’ It is activated after incrementing the IP

Error Exceptions l l l Most error exceptions — divide by zero, invalid operation,

Error Exceptions l l l Most error exceptions — divide by zero, invalid operation, illegal memory reference, etc. — translate directly into signals This isn’t a coincidence. . . The kernel’s job is fairly simple: send the appropriate signal to the current process l l force_sig(sig_number, current); That will probably kill the process, but that’s not the concern of the exception handler One important exception: page fault An exception can (infrequently) happen in the kernel l die(); // kernel oops

Intel-Reserved ID-Numbers l l l Of the 256 possible interrupt ID numbers, Intel reserves

Intel-Reserved ID-Numbers l l l Of the 256 possible interrupt ID numbers, Intel reserves the first 32 for ‘exceptions’ OS’s such as Linux are free to use the remaining 224 available interrupt ID numbers for their own purposes (e. g. , for servicerequests from external devices, or for other purposes such as system-calls) Examples: l 0: divide-overflow fault l 6: Undefined Opcode l 7: Coprocessor Not Available l 11: Segment-Not-Present fault l 12: Stack fault l 13: General Protection Exception l 14: Page-Fault Exception

Interrupt Hardware Legacy PC Design (for single-proc systems) Ethernet IRQs Slave PIC (8259) SCSI

Interrupt Hardware Legacy PC Design (for single-proc systems) Ethernet IRQs Slave PIC (8259) SCSI Disk Master PIC (8259) INTR x 86 CPU Real-Time Clock Keyboard Controller l l l Programmable Interval-Timer I/O devices have (unique or shared) Interrupt Request Lines (IRQs) IRQs are mapped by special hardware to interrupt vectors, and passed to the CPU This hardware is called a Programmable Interrupt Controller (PIC)

The `Interrupt Controller’ l l l Responsible for telling the CPU when a specific

The `Interrupt Controller’ l l l Responsible for telling the CPU when a specific external device wishes to ‘interrupt’ l Needs to tell the CPU which one among several devices is the one needing service PIC translates IRQ to vector l Raises interrupt to CPU l Vector available in register l Waits for ack from CPU Interrupts can have varying priorities l PIC also needs to prioritize multiple requests Possible to “mask” (disable) interrupts at PIC or CPU Early systems cascaded two 8 input chips (8259 A)

Example: Interrupts on 80386 l l 80386 core has one interrupt line, one interrupt

Example: Interrupts on 80386 l l 80386 core has one interrupt line, one interrupt acknowledge line Interrupt sequence: l l l Interrupt controller raises INT line 80386 core pulses INTA line low, allowing INT to go low 80386 core pulses INTA line low again, signaling controller to put interrupt number on data bus

Multiple Logical Processors Multi-CORE CPU 0 CPU 1 LOCAL APIC I/O APIC Advanced Programmable

Multiple Logical Processors Multi-CORE CPU 0 CPU 1 LOCAL APIC I/O APIC Advanced Programmable Interrupt Controller is needed to perform ‘routing’ of I/O requests from peripherals to CPUs (The legacy PICs are masked when the APICs are enabled)

APIC, IO-APIC, LAPIC l Advanced PIC (APIC) for SMP systems l l Local APIC

APIC, IO-APIC, LAPIC l Advanced PIC (APIC) for SMP systems l l Local APIC (LAPIC) versus “frontend” IO-APIC l l l Used in all modern systems Interrupts “routed” to CPU over system bus IPI: inter-processor interrupt Devices connect to front-end IO-APIC communicates (over bus) with Local APIC Interrupt routing l l Allows broadcast or selective routing of interrupts Ability to distribute interrupt handling load Routes to lowest priority process l Special register: Task Priority Register (TPR) Arbitrates (round-robin) if equal priority

Hardware to Software Memory Bus IRQs 0 idtr PIC INTR CPU 0 IDT vector

Hardware to Software Memory Bus IRQs 0 idtr PIC INTR CPU 0 IDT vector N handler Mask points 255

Assigning IRQs to Devices l l l l IRQ assignment is hardware-dependent Sometimes it’s

Assigning IRQs to Devices l l l l IRQ assignment is hardware-dependent Sometimes it’s hardwired, sometimes it’s set physically, sometimes it’s programmable PCI bus usually assigns IRQs at boot Some IRQs are fixed by the architecture l IRQ 0: Interval timer l IRQ 2: Cascade pin for 8259 A Linux device drivers request IRQs when the device is opened Note: especially useful for dynamically-loaded drivers, such as for USB or PCMCIA devices Two devices that aren’t used at the same time can share an IRQ, even if the hardware doesn’t support simultaneous sharing

Assigning Vectors to IRQs l l Vector: index (0 -255) into interrupt descriptor table

Assigning Vectors to IRQs l l Vector: index (0 -255) into interrupt descriptor table Vectors usually IRQ# + 32 l l Below 32 reserved for non-maskable intr & exceptions Maskable interrupts can be assigned as needed Vector 128 used for syscall Vectors 251 -255 used for IPI

Interrupt Descriptor Table l l The ‘entry-point’ to the interrupt-handler is located via the

Interrupt Descriptor Table l l The ‘entry-point’ to the interrupt-handler is located via the Interrupt Descriptor Table (IDT) IDT: “gate descriptors” l l l Segment selector + offset for handler Descriptor Privilege Level (DPL) Gates (slightly different ways of entering kernel) l Task gate: includes TSS to transfer to (not used by Linux) l l Interrupt gate: disables further interrupts Trap gate: further interrupts still allowed

Interrupt Masking l l Two different types: global and per-IRQ Global — delays all

Interrupt Masking l l Two different types: global and per-IRQ Global — delays all interrupts Selective — individual IRQs can be masked selectively Selective masking is usually what’s needed — interference most common from two interrupts of the same type

Putting It All Together Memory Bus IRQs 0 idtr PIC INTR CPU 0 IDT

Putting It All Together Memory Bus IRQs 0 idtr PIC INTR CPU 0 IDT vector N handler Mask points 255

Dispatching Interrupts l l l l Each interrupt has to be handled by a

Dispatching Interrupts l l l l Each interrupt has to be handled by a special device - or trap-specific routine Interrupt Descriptor Table (IDT) has gate descriptors for each interrupt vector Hardware locates the proper gate descriptor for this interrupt vector, and locates the new context A new stack pointer, program counter, CPU and memory state, etc. , are loaded Global interrupt mask set The old program counter, stack pointer, CPU and memory state, etc. , are saved on the new stack The specific handler is invoked

Overview l l The Hardware Part l Interrupts and Exceptions l Exception Types and

Overview l l The Hardware Part l Interrupts and Exceptions l Exception Types and Handling l Interrupt Request Lines (IRQs) l Programmable Interrupt Controllers (PIC) l Interrupt Descriptor Table (IDT) l Hardware Dispatching of Interrupts The Software Part l Nested Execution l Kernel Stacks l Soft. IRQs, Tasklets l Work Queues l Threaded Interrupts

Nested Interrupts l l l What if a second interrupt occurs while an interrupt

Nested Interrupts l l l What if a second interrupt occurs while an interrupt routine is excuting? Generally a good thing to permit that — is it possible? And why is it a good thing?

Maximizing Parallelism l l l You want to keep all I/O devices as busy

Maximizing Parallelism l l l You want to keep all I/O devices as busy as possible In general, an I/O interrupt represents the end of an operation; another request should be issued as soon as possible Most devices don’t interfere with each others’ data structures; there’s no reason to block out other devices

Handling Nested Interrupts l l As soon as possible, unmask the global interrupt As

Handling Nested Interrupts l l As soon as possible, unmask the global interrupt As soon as reasonable, re-enable interrupts from that IRQ But that isn’t always a great idea, since it could cause re-entry to the same handler IRQ-specific mask is not enabled during interrupt-handling

Nested Execution l Interrupts can be interrupted l l l Exceptions can be interrupted

Nested Execution l Interrupts can be interrupted l l l Exceptions can be interrupted l l By different interrupts; handlers need not be reentrant No notion of priority in Linux Small portions execute with interrupts disabled Interrupts remain pending until acked by CPU By interrupts (devices needing service) Exceptions can nest two levels deep l l l Exceptions indicate coding error Exception code (kernel code) shouldn’t have bugs Page fault is possible (trying to touch user data)

Interrupt Handling Philosophy l l l Do as little as possible in the interrupt

Interrupt Handling Philosophy l l l Do as little as possible in the interrupt handler Defer non-critical actions till later Structure: top and bottom halves l l Top-half: do minimum work and return (ISR) Bottom-half: deferred processing (softirqs, tasklets, workqueues, kernel threads) Top half tasklet softirq workqueue kernel thread Bottom half

Top Half: Do it Now! l l Technically is the interrupt handler Perform minimal,

Top Half: Do it Now! l l Technically is the interrupt handler Perform minimal, common functions: save registers, unmask other interrupts. Eventually, undoes that: restores registers, returns to previous context. l l IRQ is typically masked for duration of top half Most important: call proper interrupt handler provided in device drivers (C program) Don’t want to do too much here l l l Often written in assembler IRQs are masked for part of the time Don’t want stack to get too big Typically queue the request and set a flag for deferred processing in a bottom half

Top Half: Find the Handler l l l On modern hardware, multiple I/O devices

Top Half: Find the Handler l l l On modern hardware, multiple I/O devices can share a single IRQ and hence interrupt vector First differentiator is the interrupt vector Multiple interrupt service routines (ISR) can be associated with a vector Each device’s ISR for that IRQ is called Device determines whether IRQ is for it

Bottom Half: Do it Later! l Mechanisms to defer work to later: l l

Bottom Half: Do it Later! l Mechanisms to defer work to later: l l l softirqs tasklets (built on top of softirqs) work queues kernel threads All can be interrupted Top half tasklet softirq workqueue kernel thread Bottom half

Warning: No Process Context l l Interrupts (as opposed to exceptions) are not associated

Warning: No Process Context l l Interrupts (as opposed to exceptions) are not associated with particular instructions They’re also not associated with a given process (user program) The currently-running process, at the time of the interrupt, as no relationship whatsoever to that interrupt Interrupt handlers cannot sleep!

What Can’t You Do? l You cannot sleep l l l You cannot refer

What Can’t You Do? l You cannot sleep l l l You cannot refer to current You cannot allocate memory with GPF_KERNEL (which can sleep), you must use GPF_ATOMIC (which can fail) You cannot call schedule() You cannot do a down() semaphore call l l or call something that might sleep However, you can do an up() You cannot transfer data to/from user space l E. g. , copy_to_user(), copy_from_user()

Interrupt Stack l When an interrupt occurs, what stack is used? l l Exceptions:

Interrupt Stack l When an interrupt occurs, what stack is used? l l Exceptions: The kernel stack of the current process, whatever it is, is used (There’s always some process running — the “idle” process, if nothing else) Interrupts: hard IRQ stack (1 per processor) Soft. IRQs: soft IRQ stack (1 per processor) These stacks are configured in the IDT and TSS at boot time by the kernel

Softirqs l l Statically allocated: specified at kernel compile time Limited number: Priority Type

Softirqs l l Statically allocated: specified at kernel compile time Limited number: Priority Type 0 High-priority tasklets 1 Timer interrupts 2 Network transmission 3 Network reception 4 Block devices 5 Regular tasklets

When Do Softirqs Run? l Run at various points by the kernel: l l

When Do Softirqs Run? l Run at various points by the kernel: l l l Softirq routines can be executed simultaneously on multiple CPUs: l l l After system calls After exceptions After interrupts (top halves/IRQs, including the timer intr) When the scheduler runs ksoftirqd Code must be re-entrant Code must do its own locking as needed Hardware interrupts always enabled when softirqs are running.

Rescheduling Softirqs l l A softirq routine can reschedule itself This could starve user-level

Rescheduling Softirqs l l A softirq routine can reschedule itself This could starve user-level processes Softirq scheduler only runs a limited number of requests at a time The rest are executed by a kernel thread, ksoftirqd, which competes with user processes for CPU time

Tasklets l l l Built on top of softirqs Can be created and destroyed

Tasklets l l l Built on top of softirqs Can be created and destroyed dynamically Run on the CPU that scheduled it (cache affinity) Individual tasklets are locked during execution; no problem about re-entrancy, and no need for locking by the code Tasklets can run in parallel on multiple CPUs l l Same tasklet can only run on one CPU Were once the preferred mechanism for most deferred activity, now changing

The Trouble with Tasklets l l l Hard to get right One has to

The Trouble with Tasklets l l l Hard to get right One has to be careful about sleeping They run at higher priority than other tasks in the systems Can produce uncontrolled latency if coded badly Ongoing discussion about eliminating tasklets Will likely slowly fade over time

Work Queues l Always run by kernel threads l l Softirqs and tasklets run

Work Queues l Always run by kernel threads l l Softirqs and tasklets run in an interrupt context; work queues have a pseudo-process context l l i. e. , have a kernel context but no user context Because they have a pseudo-process context, they can sleep l l l Are scheduled by the scheduler Work queues are shared by multiple devices Thus, sleeping will delay other work on the queue However, they’re kernel-only; there is no user mode associated with it l Don’t try copying data into/out of user space

Kernel Threads l Always operate in kernel mode l l Again, no user context

Kernel Threads l Always operate in kernel mode l l Again, no user context 2. 6. 30 introduced the notion of threaded interrupt handlers l l Imported from the realtime tree request_threaded_irq() Now each bottom half has its own context, unlike work queues Idea is to eventually replace tasklets and work queues

Comparing Approaches ISR Soft. IRQ Tasklet Work. Queue KThread Will disable all interrupts? Briefly

Comparing Approaches ISR Soft. IRQ Tasklet Work. Queue KThread Will disable all interrupts? Briefly No No Will disable other instances of self? Yes No No No Higher priority than regular scheduled tasks? Yes* No No Will be run on same processor as ISR? N/A Yes Yes Maybe More than one run can on same CPU? No No No Yes Same one can run on multiple CPUs? Yes No Yes Full context switch? No No No Yes Can sleep? (Has own kernel stack) No No No Yes Can access user space? No No No *Within limits, can be run by ksoftirqd

Return Code Path l Interleaved assembly entry points: l l l ret_from_exception() ret_from_intr() ret_from_sys_call()

Return Code Path l Interleaved assembly entry points: l l l ret_from_exception() ret_from_intr() ret_from_sys_call() ret_from_fork() Things that happen: l l Run scheduler if necessary Return to user mode if no nested handlers l l l Restore context, user-stack, switch mode Re-enable interrupts if necessary Deliver pending signals

Monitoring Interrupt Activity l l Linux has a pseudo-file system, /proc, for monitoring (and

Monitoring Interrupt Activity l l Linux has a pseudo-file system, /proc, for monitoring (and sometimes changing) kernel behavior Run cat /proc/interrupts to see what’s going on

/proc/interrupts $ cat /proc/interrupts CPU 0 0: 865119901 1: 4 2: 0 8: 1

/proc/interrupts $ cat /proc/interrupts CPU 0 0: 865119901 1: 4 2: 0 8: 1 12: 20 14: 6532494 15: 34 16: 0 19: 0 23: 0 32: 40 33: 40 48: 273306628 NMI: 0 ERR: 0 l IO-APIC-edge XT-PIC IO-APIC-edge IO-APIC-level IO-APIC-level timer keyboard cascade rtc PS/2 Mouse ide 0 ide 1 usb-uhci ehci-hcd ioc 0 ioc 1 eth 0 Columns: IRQ, count, interrupt controller, devices

More in /proc/pci: $ cat /proc/pci PCI devices found: Bus 0, device 0, function

More in /proc/pci: $ cat /proc/pci PCI devices found: Bus 0, device 0, function 0: Host bridge: PCI device 8086: 2550 (Intel Corp. ) (rev 3). Prefetchable 32 bit memory at 0 xe 8000000 [0 xebffffff]. Bus 0, device 29, function 1: USB Controller: Intel Corp. 82801 DB USB (Hub #2) (rev 2). IRQ 19. I/O at 0 xd 400 [0 xd 41 f]. Bus 0, device 31, function 1: IDE interface: Intel Corp. 82801 DB ICH 4 IDE (rev 2). IRQ 16. I/O at 0 xf 000 [0 xf 00 f]. Non-prefetchable 32 bit memory at 0 x 80000000 [0 x 800003 ff]. Bus 3, device 1, function 0: Ethernet controller: Broadcom Net. Xtreme BCM 5703 X Gigabit Eth (rev 2). IRQ 48. Master Capable. Latency=64. Min Gnt=64. Non-prefetchable 64 bit memory at 0 xf 7000000 [0 xf 700 ffff].

Portability l l l Which has a higher priority, a disk interrupt or a

Portability l l l Which has a higher priority, a disk interrupt or a network interrupt? Different CPU architectures make different decisions By not assuming or enforcing any priority, Linux becomes more portable

Summary l l l l l Exception vs. Interrupt l Synchronous vs. Asynchronous l

Summary l l l l l Exception vs. Interrupt l Synchronous vs. Asynchronous l Fault vs. Trap Relation between exceptions and signals Device IRQs APICs Interrupt Hardware handling (vector) Interrupt masking ISRs can share IRQs Interrupt stacks (Kernel, Hard. IRQ, Soft. IRQ) Top half vs. bottom half Soft. IRQs, tasklets, workqueues, ksoftirqd, kernel threads, ksoftirqd l Interrupt context vs. process context l Who can block Opportunities for rescheduling (preempting)

Backup Foils

Backup Foils

Three crucial data-structures l The Global Descriptor Table (GDT) defines the system’s memory-segments and

Three crucial data-structures l The Global Descriptor Table (GDT) defines the system’s memory-segments and their access-privileges, which the CPU has the duty to enforce l The Interrupt Descriptor Table (IDT) defines entry-points for the various code-routines that will handle all ‘interrupts’ and ‘exceptions’ l The Task-State Segment (TSS) holds the values for registers SS and ESP that will get loaded by the CPU upon entering kernel-mode

How does CPU find GDT/IDT? l l Two dedicated registers: GDTR and IDTR Both

How does CPU find GDT/IDT? l l Two dedicated registers: GDTR and IDTR Both have identical 48 -bit formats: Segment Base Address 47 Segment Limit 16 15 0 Kernel must setup these registers during system startup (set-and-forget) Privileged instructions: LGDT and LIDT used to set these register-values Unprivileged instructions: SGDT and SIDT used for reading register-values

How does CPU find the TSS? l Dedicated system segment-register TR holds a descriptor’s

How does CPU find the TSS? l Dedicated system segment-register TR holds a descriptor’s offset into the GDT The kernel must set up the GDT and TSS structures and must load the GDTR and the TR registers TR GDT TSS The CPU knows the layout of fields in the Task-State Segment

IDT Initialization l Initialized once by BIOS in real mode l l Must not

IDT Initialization l Initialized once by BIOS in real mode l l Must not expose kernel to user mode access l l Linux re-initializes during kernel init start by zeroing all descriptors Linux lingo: l l l Interrupt gate (same as Intel; no user access) l Not accessible from user mode System gate (Intel trap gate; user access) l Used for int, int 3, into, bounds Trap gate (same as Intel; no user access) l Used for exceptions

Interrupt Processing l BUILD_IRQ macro generates: l l IRQn_interrupt: l pushl $n-256 // negative

Interrupt Processing l BUILD_IRQ macro generates: l l IRQn_interrupt: l pushl $n-256 // negative to distinguish syscalls l jmp common_interrupt Common code: l l common_interrupt: l SAVE_ALL // save a few more registers than hardware l call do_IRQ l jmp $ret_from_intr do_IRQ() is C code that handles all interrupts

Low-level IRQ Processing l do_IRQ(): l l l l get vector, index into irq_desc

Low-level IRQ Processing l do_IRQ(): l l l l get vector, index into irq_desc for appropriate struct grab per-vector spinlock, ack (to PIC) and mask line set flags (IRQ_PENDING) really process IRQ? (may be disabled, etc. ) call handle_IRQ_event() some logic for handling lost IRQs on SMP systems handle_IRQ_event(): l l enable interrupts if needed (SA_INTERRUPT clear) execute all ISRs for this vector: l action->handler(irq, action->dev_id, regs);

IRQ Data Structures l irq_desc: array of IRQ descriptors l l status (flags), lock,

IRQ Data Structures l irq_desc: array of IRQ descriptors l l status (flags), lock, depth (for nested disables) handler: PIC device driver! action: linked list of irqaction structs (containing ISRs) irqaction: ISR info l l handler: actual ISR! flags: l l l SA_INTERRUPT: interrupts disabled if set SA_SHIRQ: sharing allowed SA_SAMPLE_RANDOM: input for /dev/random entropy pool name: for /proc/interrupts dev_id, next irq_stat: per-cpu counters (for /proc/interrupts)

Hardware Handling l On entry: l l Which vector? Get corresponding descriptor in IDT

Hardware Handling l On entry: l l Which vector? Get corresponding descriptor in IDT Find specified descriptor in GDT (for handler) Check privilege levels (CPL, DPL) l l l Save eflags, cs, (original) eip on stack Jump to appropriate handler l l If entering kernel mode, set kernel stack Assembly code prepares C stack, calls handler On return (i. e. iret): l l l Restore registers from stack If returning to user mode, restore user stack Clear segment registers (if privileged selectors)

Interrupt Handling l More complex than exceptions l l Requires registry, deferred processing, etc.

Interrupt Handling l More complex than exceptions l l Requires registry, deferred processing, etc. Three types of actions: l Critical: Top-half (interrupts disabled – briefly!) l l Non-critical: Top-half (interrupts enabled) l l Example: acknowledge interrupt Example: read key scan code, add to buffer Non-critical deferrable: Do it “later” (interrupts enabled) l l Example: copy keyboard buffer to terminal handler process Softirqs, tasklets