Operating System Three easy pieces Remzi H ArpaciDusseau













- Slides: 13
Operating System : Three easy pieces Remzi H. Arpaci-Dusseau Andrea C. Arpaci-Dusseau Data Integrity and Protection Juyong Shin(jyshin@archi. snu. ac. kr) School of Computer Science and Engineering Seoul National University
Overview § Data integrity Source : http: //ko. wikipedia. org/wiki/광개토왕릉비, retrieved on 2015/06/07 Operating System : Three easy pieces 2
Overview § Disk failure modes § Block corruption § Latent-sector errors A B A C Operating System : Three easy pieces B D C E F 3
Overview § Findings about errors § Latent sector errors • Annual error rate increases in year two • LSEs increase with disk size • Spatial and temporal locality § Block corruption • Workload and disk size independent • Spatial locality, and some temporal locality • Weak correlation with LSEs Operating System : Three easy pieces 4
How to Correct Errors § Duplication § RAID § Error correcting codes Operating System : Three easy pieces 5
How to Detect Errors § Error correcting codes § Adding redundant data for error detection and correction 101 1 encoding 111 1 110 sending 011 decoding § Checksum § Producing a summary of the contents of the data 11 calculating 11 0 sending Operating System : Three easy pieces 11 0 11 11 1 failure calculating 6
How to Detect Errors § Checksum § XOR-based checksums Source : http: //www. instructables. com/, retrieved on 2015/06/07 Operating System : Three easy pieces 7
How to Detect Errors § Checksum § Fletcher checksum - addition Source : http: //www. chegg. com/, retrieved on 2015/06/07 Operating System : Three easy pieces 8
How to Detect Errors § Cyclic redundancy check § Using Division instead of addition § The remainder is the value of the CRC Source : http: //en. wikipedia. org/wiki/Cyclic_redundancy_check, retrieved on 2015/06/07 Operating System : Three easy pieces 9
How to Detect Errors § Checksum layout § Original data block layout § Checksum with each data block § Packed checksums Operating System : Three easy pieces 10
How to Detect Errors § Tricky cases § Misdirected writes • Correct data at wrong location • Simple solution : adding physical identifier Operating System : Three easy pieces 11
How to Detect Errors § Tricky cases § Lost writes • Cached data can be lost • Old contents of the block rather than new contents • Solutions : write verify adding checksum elsewhere in the system § Scrubbing • Checking checksums periodically • Reducing the chances that all copies of a data become corrupted Operating System : Three easy pieces 12
Summary § Data integrity § Error correction § Duplication – High space overhead § ECC – Complicated HW engine § RAID – Multiple storage devices are needed § Error detection § Parity – Simple HW § Checksum – Simple operation § CRC – Commonly used in digital networks and storage devices Operating System : Three easy pieces 13