Storage Systems CSE 598 d Spring 2007 Lecture

  • Slides: 37
Download presentation
Storage Systems CSE 598 d, Spring 2007 Lecture 3: Disk drive trends, modeling (contd.

Storage Systems CSE 598 d, Spring 2007 Lecture 3: Disk drive trends, modeling (contd. ) Feb 1, 2007

 • Topics – Disk drive modeling – SCSI vs ATA – Rules of

• Topics – Disk drive modeling – SCSI vs ATA – Rules of thumb in data engineering

Disk Drive Modeling • Problems because – Non-linear – State Dependent • Not easy

Disk Drive Modeling • Problems because – Non-linear – State Dependent • Not easy to model analytically • Pitfalls – – Seek time linear w. r. t distance Uniform distribution for rotational latency Constant transfer times Ignoring bus contention

Comparing four different models • (i) Constant fixed time for each I/O • (ii)

Comparing four different models • (i) Constant fixed time for each I/O • (ii) Simple model which is – – – Seek time is linear with distance No head settle/switch costs Uniform rotational delay Fixed controller costs Linear transfer costs • (iii) Better seek and positioning model – – (a) 3. 45+0. 597*sqrt(d) ms (for < 616 cylinders) (b) 10. 8+0. 012 d ms (for >= 616 cylinders) 2. 5 ms for head/track switch Keeps track of rotational position • (iv) All of (iii) + Cache model + Read ahead + Bus speed + Controller overheads • Chosen metric for comparison: relative demerit

Results (i) (iii) (iv) This disk does not have a cache!

Results (i) (iii) (iv) This disk does not have a cache!

Disk with a cache (iii) (iv)

Disk with a cache (iii) (iv)

Conclusions • The following aspects are important – Disk cache/buffer (112) – Data transfer

Conclusions • The following aspects are important – Disk cache/buffer (112) – Data transfer model (20) • Overlaps with bus transfer, seek-time, head-switching – Rotational position, data layout (2) • While these are not (that important)

How do we get the drive parameters for such modeling? • Manuals/Data sheets –

How do we get the drive parameters for such modeling? • Manuals/Data sheets – Not everything is publicized – Things can still vary • Interrogative extraction – Though extensive SCSI interface, not all may be supported – Several more parameters may be needed • Empirical/experimental extraction – This is hard

Complications of empirical extraction • Overlapping controller overheads, bus transfers, mechanical delays, etc. •

Complications of empirical extraction • Overlapping controller overheads, bus transfers, mechanical delays, etc. • Contention for shared resources • Cache segmentation • Prefetching • Non-uniformity in performance (e. g. seeks) • Large seemingly non-deterministic delays (e. g. thermal recalibration) • Fluctuations in timing.

Parameters needed for modeling • Data layout • Seek, rotational latency and transfer costs

Parameters needed for modeling • Data layout • Seek, rotational latency and transfer costs • Bus, controller and host processing costs • Caching and prefetching parameters

Data Layout Parameters • Where does a block actually reside on disk? • May

Data Layout Parameters • Where does a block actually reside on disk? • May need to be re-acquired upon each formatting (since a re-allocated defect may be converted to slipped defects for better efficiency) • SEND/RECEIVE diagnostic of SCSI interface can be used to query the actual location of a block. – Doing this for each block would be very time-consuming

Storage Systems CSE 598 d, Spring 2007 Lecture 4: Disk drive trends, modeling (contd.

Storage Systems CSE 598 d, Spring 2007 Lecture 4: Disk drive trends, modeling (contd. ) Feb 8, 2007

Empirical Extraction • Send commands to disk and measure Mean Time Between Request Completions

Empirical Extraction • Send commands to disk and measure Mean Time Between Request Completions – MTBRC(a, b) – of 2 requests iteratively. • Rotational distance between request pairs is varied until a minimum is reached.

Extracting Head switch time • MTBRC 1 = MTBRC(1 -sector write, 1 -sector read

Extracting Head switch time • MTBRC 1 = MTBRC(1 -sector write, 1 -sector read on the same track) = Host 1+Cmd+Media+Bus+Comp • MTBRC 2 = MTBRC(1 -sector write, 1 -sector read on a diff. track of same cylinder) = Host 2+Cmd+Hd. Sw+Media+Bus+Comp Hd. Sw = (MTBRC 2–Host 2) – (MTBRC 1–Host 1)

Extracting Seek Times (i) For each seek distance, select 5 points evenly spaced. From

Extracting Seek Times (i) For each seek distance, select 5 points evenly spaced. From each of these points, perform 10 inward and 10 outward seeks of this distance. Get the average of these. (ii) Measure MTBRC(1 -sector write, 1 -sector read on same track), and MTBRC(1 -sector write, 1 -sector read on next cylinder). Difference between these is mechanical time for 1 -cylinder seek. (iii) Subtract (ii) from the 1 -cylinder distance value of (i). The diff. represents the non-mechanical overheads of seek. Subtract (iii) from each of the values obtained in (i)

Typical Seek time profile

Typical Seek time profile

Extracting Rotation Speed • Perform a series of 1 -sector writes to the same

Extracting Rotation Speed • Perform a series of 1 -sector writes to the same location and calculate the mean time between completions.

Extracting Cache Segments, Size … • Say the # of segments is N. •

Extracting Cache Segments, Size … • Say the # of segments is N. • Perform 1 -sector reads of the first logical blocks of the first N-1 cylinders • Perform a 1 -sector read of the first logical block of the last data cylinder • Perform a 1 -sector read of the first logical block of the first cylinder. If that is a hit (measured by response time), then # of segments is N or greater.

Extraction techniques for • Segment size • Do prefetched data replace requested data in

Extraction techniques for • Segment size • Do prefetched data replace requested data in the current segment? • Are all requested data always thrown away? • Does prefetching stop on track/cylinder boundaries? • Is the prefetching size proportional to request size? • Does it implement read-on-arrival? Write-on-arrival? • Is cache space allocated on a track or sector basis? • Can READs hit on data placed in the cache by WRITEs? • What is the segment replacement algorithm?

The physical I/O path from CPU to the disk

The physical I/O path from CPU to the disk

CPUs RAMs System bus Bridge chip Host I/O bus (PCI, Infiniband) SCSI HBA SCSI

CPUs RAMs System bus Bridge chip Host I/O bus (PCI, Infiniband) SCSI HBA SCSI FC HBA i. SCSI Graphics Ethernet HBA Card NIC Fibre Channel IP LAN I/O buses

 • System bus – Rapid data transfer between CPU and memory • Host

• System bus – Rapid data transfer between CPU and memory • Host I/O bus – Common: PCI, emerging: Infiniband • Device drivers responsible for control of and communication with peripheral devices of all types – Part of the device driver for storage device almost always realized by firmware that is processed by special processors (ASICs) • ASICs are partially integrated into the main curcuit board, such as on-board SCSI controllers, or connected to the main board via add-on cards (PCI cards) – Storage devices connected to the server via the host bus adapter (HBA) – Communication connection between the HBA and the peripheral device is called the I/O bus • Similar I/O path/techniques used within a disk subsystem

I/O bus technologies • • • SCSI ATA/IDE, Serial-ATA (SATA) SCSI over IP (i.

I/O bus technologies • • • SCSI ATA/IDE, Serial-ATA (SATA) SCSI over IP (i. SCSI) Fibre Channel USB … many more

SCSI basics • Small Computer System Interface • First version released in 1986 –

SCSI basics • Small Computer System Interface • First version released in 1986 – Many versions since • The dominant technology for UNIX and PC servers – Assignment: find out what your laptop/desktop uses • A communication protocol as well as bus • Parallel bus for data and additional lines for control of communication

SCSI basics (more) • A daisy-chain can connect upto 16 devices together • SCSI

SCSI basics (more) • A daisy-chain can connect upto 16 devices together • SCSI protocol defines – How devices reserve the bus – In what format data is transferred – Initial versions: message then ACK then next message – Latest versions: asynchronous issuance, multiple messages in transit together, increased data rate

SCSI vs ATA: Motivating Factors • • • Cost (Market Demands) Form factor Configuration

SCSI vs ATA: Motivating Factors • • • Cost (Market Demands) Form factor Configuration in groups Reliability Access Patterns

Leading to differences in … • • Mechanics Materials Electronics Firmware Performance (RPM and

Leading to differences in … • • Mechanics Materials Electronics Firmware Performance (RPM and Seeks) Reliability Power Consumption …

Differences in Mechanics • ES Head/Disc Assembly – – – Sustain higher disturbance Higher

Differences in Mechanics • ES Head/Disc Assembly – – – Sustain higher disturbance Higher rigidity More mass Higher bandwidth servos Avoiding through holes Filter for particles, desiccant for humidity, carbon absorbent for organic materials – Better air flow hardware – O-ring seals for spindle – Higher quality sealing

Mechanics (contd. ) • Actuator – Larger magnets for faster seeks – Lower resistance

Mechanics (contd. ) • Actuator – Larger magnets for faster seeks – Lower resistance (thicker and fewer windings) actuator coils – Latch (to hold actuator when off) can affect seek performance. ES compensates for this with a bistable latch. • Spindle – Higher RPM => Windage and Vibrations – PS Drives use a cantilever design to hold a motor (captured only at base), while ES drives capture the motor at both ends.

Differences in Electronics • Needs to take and process commands from host, perform head

Differences in Electronics • Needs to take and process commands from host, perform head positioning, servo processing, data transfers, cache management, etc. – PS drives may not have separate servo processor (to handle repeatable on non-repeatable runouts). – ES ASIC gate count 2 X PS gate count – ES firmware code 2 X PS firmware code size (to handle more concurrency) – ES Cache space 10 X PS Cache space

Differences in Magnetics • More or less similar (since there is no reason why

Differences in Magnetics • More or less similar (since there is no reason why latest advancements may not be used in both). • Main differences are in electronics needed to provide a Signal-to-Noise (SNR) ratio for the higher RPM of ES drives.

Differences in Performance • Capacity – Areal density is similar since they use same

Differences in Performance • Capacity – Areal density is similar since they use same magnetics – Differences due to # of platters and their size • Size of Platters – Power is nearly cubic to platter size. – To sustain higher RPM, ES drives use smaller platters (2. 5” and lower) -> also helps seeking • # of platters – Trend is towards de-populated drives since you can use more drives to meet the capacity demands in ES environments

Performance (contd. ) • Data Rates – Though higher RPM favors ES, PS benefits

Performance (contd. ) • Data Rates – Though higher RPM favors ES, PS benefits from larger platter size, and more frequent introductions of newer models.

Performance (contd. ) • Seeks – Mechanical improvements and smaller platters favors ES. –

Performance (contd. ) • Seeks – Mechanical improvements and smaller platters favors ES. – ES also allows larger queue depths of outstanding requests to benefit from smarter scheduling.

Rotational Vibration • Environment/nearby drives can excite the drives to throw the actuator off-track.

Rotational Vibration • Environment/nearby drives can excite the drives to throw the actuator off-track. • Note that this causes performance loss. • Need to understand how much vibration (in radians/square-sec) is present and design for it. • Some recent drives even have a vibration sensor for compensation in servo processing.

Reliability • Described based on power-on hours (8 hrs/day for PS and 24 hrs

Reliability • Described based on power-on hours (8 hrs/day for PS and 24 hrs for ES). • Depends on – Duty cycle (40% for ES vs. 75% for PS due to shorter seeks) – Temperature – Particles inside – Head crashes

Serial ATA (SATA) • Serial implementation of ATA • Higher data rates – 133

Serial ATA (SATA) • Serial implementation of ATA • Higher data rates – 133 -150 Mbps compared to 320 Mbps for SCSI • Easier to configure, cheaper, less reliable (? )