Tolerating Hardware Device Failures in Software Asim Kadav
- Slides: 29
Tolerating Hardware Device Failures in Software Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin-Madison
Current state of OS-hardware interaction • Many Linux device drivers assume device perfection » Common Linux network driver: 3 c 59 x. c While (ioread 16(ioaddr + Wn 7_Master. Status)) & 0 x 8000) ; HANG! Hardware dependence bug: Device malfunction can crash the system 9/18/2020 Tolerating Hardware Device Failures in Software
Current state of OS-hardware interaction • Hardware dependence bugs present across driver classes void hptitop_iop_request_callback(. . . ) arg = readl(. . . ); . . . if (readl(&req->result) == IOP_SUCCESS) arg->result = HPT_IOCTL_OK; } } { { Highpoint SCSI driver(hptiop. c) *Code simplified for presentation purposes 9/18/2020 Tolerating Hardware Device Failures in Software
How do the hardware bugs manifest? • Drivers often trust hardware to work correctly » Drivers use device data in critical control and data paths » Drivers do not report device malfunctions to system log » Drivers do not detect or recover from device failures 9/18/2020 Tolerating Hardware Device Failures in Software
Carburizer • Goal: Tolerate hardware device failures in software through hardware failure detection and recovery • Static analysis tool - analyze and insert code to: » Detect and fix hardware dependence bugs » Detect and generate missing error reporting information • Runtime » Handle interrupt failures » Transparently recover from failures 9/18/2020 Tolerating Hardware Device Failures in Software
Outline • • Background Hardening drivers Reporting errors Conclusion 9/18/2020 Tolerating Hardware Device Failures in Software
Hardware unreliability • Sources of hardware misbehavior: » » Device wear-out, insufficient burn-in Bridging faults Electromagnetic radiation Firmware bugs • Result of misbehavior: » Corrupted/stuck-at inputs » Timing errors/unpredictable DMA » Interrupt storms/missing interrupts 9/18/2020 Tolerating Hardware Device Failures in Software
Vendor recommendations for driver developers Recommendation Summary Recommended by Intel Validation Timing Sun Input validation � � Read once& CRC data � � DMA protection � � Infinite polling � � Stuck interrupt MS Linux � � request � Goal: Automatically. Lost implement as many recommendations as Avoid excess delay in OS � possible in commodity drivers Unexpected events � Reporting Report all failures � Recovery Handle all failures 9/18/2020 Cleanup correctly � Do not crash on failure � Wrap I/O memory access � Tolerating Hardware Device Failures in Software � � �
Carburizer architecture Hardware dependency bug detection Recovery and detection of interrupt issues OS Kernel Interface Carburizer If (c==0) {. print (“Driver init”); }. . Driver List of bugs Compiler If (c==0) {. print (“Driver init”); }. . Hardened Driver Binary Faulty Hardware 9/18/2020 Tolerating Hardware Device Failures in Software Carburizer Runtime
Outline • Background • Hardening drivers » Finding sensitive code » Repairing code • Reporting errors • Conclusion 9/18/2020 Tolerating Hardware Device Failures in Software
Hardening drivers • Goal: Remove hardware dependence bugs » Find driver code that uses data from device » Ensure driver performs validity checks • Carburizer detects and fixes hardware bugs from » » 9/18/2020 Infinite polling Unsafe static/dynamic array reference Unsafe pointer dereferences System panic calls on non-debug path Tolerating Hardware Device Failures in Software
Hardening drivers • Finding sensitive code » First pass: Identify variables that contain data from the device » We call them as tainted variables. 9/18/2020 Tolerating Hardware Device Failures in Software
Finding sensitive code First pass: Identify tainted variables int test () { a = readl(); b = inb(); c = b; d = c + 2; return d; } int set() { e = test(); } 9/18/2020 Tolerating Hardware Device Failures in Software Tainted Variables a b c d test() e
Detecting risky uses of tainted variables • Second pass: Finding hardware dependence bugs » Identify risky uses of tainted variables • Example: Infinite polling » Driver waiting for device to enter particular state » Solution: Detect loops where all terminating conditions depend on tainted variables 9/18/2020 Tolerating Hardware Device Failures in Software
Example: Infinite polling Tainted variables used for critical timing decisions static int amd 8111 e_read_phy(………) {. . . reg_val = readl(mmio + PHY_ACCESS); while (reg_val & PHY_CMD_ACTIVE) reg_val = readl(mmio + PHY_ACCESS). } AMD 8111 e network driver(amd 8111 e. c) 9/18/2020 Tolerating Hardware Device Failures in Software
Not all bugs are obvious while (DAC 960_PD_Status. Available. P(Controller. Base. Address)) { DAC 960_V 1_Command. Identifier_T Command. Identifier= DAC 960_PD_Read. Status. Command. Identifier (Controller. Base. Address); DAC 960_Command_T *Command = Controller ->Commands [Command. Identifier-1]; DAC 960_V 1_Command. Mailbox_T *Command. Mailbox = &Command->V 1. Command. Mailbox; DAC 960_V 1_Command. Opcode_T Command. Opcode=Command. Mailbox->Common. Command. Opcode; Command->V 1. Command. Status =DAC 960_PD_Read. Status. Register(Controller. Base. Address); DAC 960_PD_Acknowledge. Interrupt(Controller. Base. Address); DAC 960_PD_Acknowledge. Status(Controller. Base. Address); switch (Command. Opcode) { case DAC 960_V 1_Enquiry_Old: DAC 960_P_To_PD_Translate. Read. Write. Command(Command. Mailbox); … } DAC 960 Raid Controller(DAC 960. c) 9/18/2020 Tolerating Hardware Device Failures in Software
Detecting risky uses of tainted variables • Example II: Unsafe array accesses » Tainted variables used as array index into static or dynamic arrays » Tainted variables used as pointers 9/18/2020 Tolerating Hardware Device Failures in Software
Example: Unsafe array accesses Tainted variables used to index kernel memory w/o checks static void __init attach_pas_card(. . . ) { if ((pas_model = pas_read(0 x. FF 88))) {. . . sprintf(temp, “%s rev %d”, pas_model_names[(int) pas_model], pas_read(0 x 2789)); . . . } Pro Audio Sound driver (pas 2_card. c) 9/18/2020 Tolerating Hardware Device Failures in Software
Analysis results over the Linux kernel • Analyzed drivers in 2. 6. 18. 8 Linux kernel » 6300 driver source files » 2. 8 million lines of code » 37 minutes to analyze and compile code • Additional analyses to detect existing validation code • Re-ran analysis for 2. 6. 37. 6 Linux kernel 9/18/2020 Tolerating Hardware Device Failures in Software
Analysis results over Linux 2. 6. 18. 8 Driver class Infinite polling Static array Dynamic array Panic calls net 117 2 21 2 scsi sound video other Total 2. 6. 37. 6 298 64 174 381 860 1164 31 1 0 9 43 55 22 0 22 57 89 156 121 2 22 32 179 214 • Found 992 bugs in driver code with 7. 4% false positive rate (manual sampling of 190 bugs) 9/18/2020 Tolerating Hardware Device Failures in Software
Repairing drivers • Hardware dependence bugs difficult to test • Carburizer automatically generates repair code » » 9/18/2020 Inserts timeout code for infinite loops Inserts checks for unsafe array/pointer references Replaces calls to panic() with recovery service Triggers generic recovery service on device failure Tolerating Hardware Device Failures in Software
Outline • • Background Hardening drivers Reporting errors Conclusion 9/18/2020 Tolerating Hardware Device Failures in Software
Reporting errors • Drivers often fail silently and fail to report device errors » Drivers should proactively report device failures » Fault management systems require these inputs • Driver already detects failure but does not report them • Carburizer analysis performs two functions » Detect when there is a device failure » Report unless the driver is already reporting the failure 9/18/2020 Tolerating Hardware Device Failures in Software
Detecting driver-detected device failures • Detect code that depends on tainted variables » Perform unreported loop timeouts » Returns negative error constants » Jumps to common cleanup code while (ioread 16 (reg. A) == 0 x 0 f) { if (timeout++ == 200) { sys_report(“Device timed out %s. n”, mod_name); return (-1); } Reporting code } added by Carburizer 9/18/2020 Tolerating Hardware Device Failures in Software
Detecting existing reporting code Carburizer detects function calls with string arguments Carburizer detects existing reporting code static u 16 gm_phy_read(. . . ) {. . . if (__gm_phy_read(. . . )) printk(KERN_WARNING "%s: . . . n”, . . . ); Sys. Konnect network driver(skge. c) 9/18/2020 Tolerating Hardware Device Failures in Software
Evaluation • Fixed 1135 cases of unreported timeouts and 467 cases of unreported device failures in Linux drivers • Evaluation: Manual analysis of drivers of different classes Driver bnx 2 mptbase ens 1371 Class network scsi sound Carburizer reported/Driver detected device failures 17/24 17/28 9/10 • No. Carburizer false positives automatically improves the fault diagnosis capabilities of the system 9/18/2020 Tolerating Hardware Device Failures in Software
Conclusion Recommendation Summary Recommended by Intel Validation Timing Input validation � � Read once& CRC data � � DMA protection � � Infinite polling � � Stuck interrupt MS Linux � � Lost request � Avoid excess delay in OS � Unexpected events � Reporting Report all failures � Recovery Handle all failures 9/18/2020 Sun Cleanup correctly � Do not crash on failure � Wrap I/O memory access � Tolerating Hardware Device Failures in Software � � �
Conclusion Recommendation Summary Recommended by Intel Validation Timing Sun Input validation � � Read once& CRC data � � DMA protection � � Infinite polling � � Stuck interrupt MS Carburizer Ensures Linux � � � � Lost request � Carburizer improves system reliability by automatically ensuring � Avoid excessfailures delay in OSare tolerated in software that hardware � Unexpected events � Reporting Report all failures � Recovery Handle all failures 9/18/2020 Cleanup correctly � Do not crash on failure � Wrap I/O memory access � � � Tolerating Hardware Device Failures in Software � � �
Thank You • Contact for driver verification/tool access » kadav@cs. wisc. edu • Details on carburizer » http: //cs. wisc. edu/~kadav/carb/ Hardware dependence bug detection Carburizer If (c==0) {. print (“Drive r init”); }. . Driver Compiler Recovery and detection of interrupt failures Kernel Interface Kernel If (c==0) {. print (“Driver init”); }. . Hardened Driver Binary Faulty Hardware 9/18/2020 Tolerating Hardware Device Failures in Software OS Carburizer Runtime
- Asim kadav
- Asim kadav
- Architecture
- Nike ethical issues
- Religious tolerance means
- Input device output device storage device
- Asim siddiqui azim premji university
- Anjum asim shahid rahman
- Asim banerjee daiict
- Asim banerjee images
- Asim banerjee images
- Asim banerjee images
- Asim banerjee pic
- Asim duttaroy
- Fascitomy
- Dr asim lectures
- Internal hardware
- Articles of confederation fail
- Engineering ethics failures
- Somatogravic illusion
- Why did the articles of confederation fail
- Problems of articles of confederation
- Rpc semantics in the presence of failures
- Chapter 7 section 1 guided reading
- Aneroid wafer
- Unit 6 four market failures
- Define market failure
- Quasi public goods
- Cloud security failures
- Genghis khan