Recovering Device Drivers Michael M Swift Muthukaruppan Annamalai
- Slides: 35
Recovering Device Drivers Michael M Swift, Muthukaruppan Annamalai, Brian N Bershad and Henry Levy
Introduction ► Device drivers fail more than anything else § XP: 85% of all crashes § Linux: 7 x the bug rate of the mainline kernel ► Existing work protects the kernel ► Applications left to fend for themselves
Principles ► Device driver failures should be concealed from the driver’s clients ► Recovery logic should be centralized in a single subsystem ► Driver recovery logic should be generic ► Recovery services should have low overhead when not needed
Shadow Drivers ► Conceals driver failure from application ► Logs driver activity § Driver state (ioctls) § IO requests/calls ► On failure § Intercepts IO requests § Resets driver state by replaying log ► Model is abstract enough to be implemented for wide range of drivers
Why programs crash ► “Most drivers fail due to bugs that result from unexpected inputs or events [34]” § [34] V. Orgovan, Systems Crash Analyst, Windows Core OS Group, Microsoft Corp. private communication, 2004 § Do we really need a reference for this? § What sort of reference is that anyway?
Driver Faults ► Deterministic § Set sequence of repeatable configuration or IO requests § Unrecoverable with generic tools ► Transient § Infrequent inputs or environment settings ► Fail-stop § Kernel is protected from failing drivers § Faults are detected before collateral damage occurs ► Shadow drivers require transient and fail-stop behavior
Nooks ► Earlier work in kernel protection ► Provides fail-stop facilities § Detects memory violations § Excessive CPU usage § Bad kernel parameters § 75% success rate ► Simply reboots the driver after a fault
Shadow Driver Operation ► Passive Mode § Normal operation § Monitors all explicit communication ► Replicated ► Not procedure calls DMA § Logs driver configuration ► Active § § § Mode Recovery operation Reinitializes driver to known state Impersonates driver to the kernel
Taps ► Mechanism allowing replication and redirection of communication channels ► Passive Operation § Calls driver function then shadow function ► Active mode § Redirects all calls to shadow driver
Passive Taps
Active Taps
Shadow Manager ► Controls all shadow drivers ► Manages recovery operations ► Controls Tap insertion ► Monitors device failures
General Infrastructure ► Nooks § Isolation service § Redirection mechanism § Object tracking service ► Shadow Manager § Installs shadow drivers
Architecture
Passive Monitoring ► Tracks IO requests § Connection-oriented: offset/positioning § Request-oriented: pending request log ► Logs configuration commands § Only information stored in a persistent log § Does not replicate driver state ► Tracks kernel objects obtained § Prevents memory leaks ► Many of the replicated calls § Read/write to sound device by driver are no-ops
Active Mode Recovery ► Impersonates driver to kernel and applications ► Recovers driver § Stops failed driver § Reinitializes driver § Transfers state back into driver
Stopping the Failed Driver ► Shadow manager § Signals shadow driver of failure § Switches taps to redirection ► Shadow Driver § Disables hardware device § Garbage collects unnecessary resources
Reinitializing the Driver ► Shadow driver uses cached data section ► Initializes driver ► Reattaches driver to kernel resources ► Reenables hardware resources
Transferring Driver State ► Shadow Driver resubmits any outstanding IO requests § Possible replication of IO § If device cannot handle duplicate IO, request is canceled ► Replays logged configuration commands ► Shadow Driver signals Shadow Manager ► Taps set back to passive mode
Proxying of Requests ► Depends on driver mechanics and interface ► Possible actions § Respond with recorded information § Silently drop request § Queue request for later § Block request § Report driver busy
Limitations ► Requires dynamic loading and unloading ► Requires explicit communication channels § DMA doesn’t work ► Assumes driver failure has no external effects ► Requires effective isolation and protection service ► Cannot make real-time guarantees
Evaluation ► Performance § Overhead during passive mode ► Fault-Tolerance § Does it work ► Limitations § How many failures can be dealt with ► Code Size § Amount of kernel modification needed ► Either the advisor is a jerk or the grad students need a social life
Tested Drivers
Tested Applications
Performance ► Three configurations § Linux-Native: Stock kernel § Linux-Nooks: kernel protection § Linux-SD: Shadow driver implementation ► No additional penalty vs Linux-Nooks ► Only 1 -3% performance hit vs Linux-Native
Relative Performance
CPU Utilization
Fault Tolerance ► Bugs culled from bug-fixes posted to the linuxkernel mailing list ► Bugs were replicated inside each driver ► Placed bugs in rarely taken paths § Unusual hardware conditions ► Forced ► What driver to take unusual path is the difference between that and adding a faulting ioctl?
Fault Tolerance
Recovery Behavior ► Not completely seamless ► Noticeable gap during recovery ► Possible temporary data loss
Limitations ► How do shadow drivers perform with non fail-stop errors ► Large scale fault injection experiments ► Cases § Failure detected ►Recovery hidden from application? § Failure not detected
What would you do for a Ph. D? ► In total, we ran 2100 trials across the three drivers and six applications. Between trials, we reset the system and reloaded the driver. For each trial, we injected five random errors into the driver while the application was using it. We ensured the errors were transient by removing them during recovery. After injection, we visually observed the impact on the application and the system to determine whether a failure or recovery had occurred.
Undetected Failures ► 3 Cases § IO requests that never complete § Driver <-> Device interaction § Certain bad parameters/return codes ► Need better understanding of driver semantics
Fault Outcomes
Code Size
- Narayan annamalai
- Anand annamalai md
- Ram input or output device
- Middlesbrough recovering together
- Hardware drivers definition
- Writing device drivers for embedded systems
- Unix internals
- Designing device drivers for embedded systems
- Dosdevice
- Asim kadav
- Functions of device drivers
- Input device dan output device
- A tagout device is preferable to using a lockout device.
- Swift
- What blood type is taylor swift
- Difference between mt103 and mt202
- Marshall and swift equipment cost index table
- Lc akreditif
- Swift gamma ray burst explorer
- Nadra swift registration centre
- Karen cole swift river scenario 1
- Horatian satire is typically gentler than _____ satire.
- Cover abcd–a swift check
- Vtb bank (deutschland) ag
- Tension diagonal swift water rescue
- Society for worldwide interbank financial telecommunication
- Swift code wfbius65
- Jonathan swift epitaph
- Swift hardening law
- Rhetorical analysis of a modest proposal
- Jason lundquist
- Cr england vs swift
- Swift active screener
- Gulliver travels themes motifs and symbols
- Swift programozási nyelv
- Avaudioplayer delegate