Operating System Three easy pieces Remzi H ArpaciDusseau

  • Slides: 13
Download presentation
Operating System : Three easy pieces Remzi H. Arpaci-Dusseau Andrea C. Arpaci-Dusseau Data Integrity

Operating System : Three easy pieces Remzi H. Arpaci-Dusseau Andrea C. Arpaci-Dusseau Data Integrity and Protection Juyong Shin(jyshin@archi. snu. ac. kr) School of Computer Science and Engineering Seoul National University

Overview § Data integrity Source : http: //ko. wikipedia. org/wiki/광개토왕릉비, retrieved on 2015/06/07 Operating

Overview § Data integrity Source : http: //ko. wikipedia. org/wiki/광개토왕릉비, retrieved on 2015/06/07 Operating System : Three easy pieces 2

Overview § Disk failure modes § Block corruption § Latent-sector errors A B A

Overview § Disk failure modes § Block corruption § Latent-sector errors A B A C Operating System : Three easy pieces B D C E F 3

Overview § Findings about errors § Latent sector errors • Annual error rate increases

Overview § Findings about errors § Latent sector errors • Annual error rate increases in year two • LSEs increase with disk size • Spatial and temporal locality § Block corruption • Workload and disk size independent • Spatial locality, and some temporal locality • Weak correlation with LSEs Operating System : Three easy pieces 4

How to Correct Errors § Duplication § RAID § Error correcting codes Operating System

How to Correct Errors § Duplication § RAID § Error correcting codes Operating System : Three easy pieces 5

How to Detect Errors § Error correcting codes § Adding redundant data for error

How to Detect Errors § Error correcting codes § Adding redundant data for error detection and correction 101 1 encoding 111 1 110 sending 011 decoding § Checksum § Producing a summary of the contents of the data 11 calculating 11 0 sending Operating System : Three easy pieces 11 0 11 11 1 failure calculating 6

How to Detect Errors § Checksum § XOR-based checksums Source : http: //www. instructables.

How to Detect Errors § Checksum § XOR-based checksums Source : http: //www. instructables. com/, retrieved on 2015/06/07 Operating System : Three easy pieces 7

How to Detect Errors § Checksum § Fletcher checksum - addition Source : http:

How to Detect Errors § Checksum § Fletcher checksum - addition Source : http: //www. chegg. com/, retrieved on 2015/06/07 Operating System : Three easy pieces 8

How to Detect Errors § Cyclic redundancy check § Using Division instead of addition

How to Detect Errors § Cyclic redundancy check § Using Division instead of addition § The remainder is the value of the CRC Source : http: //en. wikipedia. org/wiki/Cyclic_redundancy_check, retrieved on 2015/06/07 Operating System : Three easy pieces 9

How to Detect Errors § Checksum layout § Original data block layout § Checksum

How to Detect Errors § Checksum layout § Original data block layout § Checksum with each data block § Packed checksums Operating System : Three easy pieces 10

How to Detect Errors § Tricky cases § Misdirected writes • Correct data at

How to Detect Errors § Tricky cases § Misdirected writes • Correct data at wrong location • Simple solution : adding physical identifier Operating System : Three easy pieces 11

How to Detect Errors § Tricky cases § Lost writes • Cached data can

How to Detect Errors § Tricky cases § Lost writes • Cached data can be lost • Old contents of the block rather than new contents • Solutions : write verify adding checksum elsewhere in the system § Scrubbing • Checking checksums periodically • Reducing the chances that all copies of a data become corrupted Operating System : Three easy pieces 12

Summary § Data integrity § Error correction § Duplication – High space overhead §

Summary § Data integrity § Error correction § Duplication – High space overhead § ECC – Complicated HW engine § RAID – Multiple storage devices are needed § Error detection § Parity – Simple HW § Checksum – Simple operation § CRC – Commonly used in digital networks and storage devices Operating System : Three easy pieces 13