Storage Systems CSE 598 d Spring 2007 Lecture

  • Slides: 12
Download presentation
Storage Systems CSE 598 d, Spring 2007 Lecture ? : Rules of thumb in

Storage Systems CSE 598 d, Spring 2007 Lecture ? : Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007

Contents • Examination of rules-of-thumb in data engineering – Moore’s law – Amdahl’s rules

Contents • Examination of rules-of-thumb in data engineering – Moore’s law – Amdahl’s rules – Gilder’s law • Technological trends and how/whether existing rules-of-thumb need to be re-thought

Moore’s Law • Circuit densities grow at 4 x every 3 years – 100

Moore’s Law • Circuit densities grow at 4 x every 3 years – 100 x increase in a decade – More generally: Ax every B years – Originally meant for RAM • Implies an extra bit of addressing every 18 months • From 16 -bit of addressing in 70 s (1 MB) to 64 bit addressing these days (several GB) – Extended to CPU and storage

Disk parameters over time

Disk parameters over time

Moore’s law applied to HDD • Disk capacity has increased more than 100 x

Moore’s law applied to HDD • Disk capacity has increased more than 100 x in the last decade! – Areal density up from 20 Mbpsi to 35 Gbpsi • However, data rate has only increased 30 x – Capacity / Accesses per sec growing 10 x per decade – Capacity / bandwidth growing 10 x per decade • Implications: – Disk accesses becoming more precious – Disk data becoming “cooler”

Closer look at the implications • Discussion – Does the increase in disk capacity

Closer look at the implications • Discussion – Does the increase in disk capacity mean applications are also using correspondingly large stores? – Why are disk accesses per second going up? • Recall these have grown slower than areal density • 10 years ago: 30 kaps for 1 GB data • Today: 120 kaps for 80 GB data – That is, only 1. 5 kaps per GB – HDD data needs to be 10 -100 x cooler than it was 10 years ago – Use large main memories (caching)

Costly disk accesses have led to. . • Preferring few large transfers over many

Costly disk accesses have led to. . • Preferring few large transfers over many small ones • Preferring sequential transfers – Log-structured file systems • Mirroring rather than other forms of redundancy

Cost trends • Historically – Tape: HDD: RAM has been 1: 1000 • Calculation

Cost trends • Historically – Tape: HDD: RAM has been 1: 1000 • Calculation for a modern system gives 1: 3: 300 – Disk prices are approaching tape prices • Disks are replacing tapes in several domains – Cost/MB for RAM declines 100 x in a decade • What is economical to put on disk today may be economical to put on RAM in 10 years – RAM taking up lot of the role of the HDD, HDD taking up a lot of the role of tape • Storage management costs exceed device costs • Admins required to manage more and more data – Automation, self-manageability becoming crucial

Amhdal’s System Balance Rules • Parallelism law – Expresses maximum achievable speedup in terms

Amhdal’s System Balance Rules • Parallelism law – Expresses maximum achievable speedup in terms of the fraction of parallelizable component of a computation • Balanced system law – A system needs 1 bit of IO/sec per instruction/sec • IOPS = IPS • Memory law – MB/MIPS ratio in a balanced system is 1 • IO law – Programs do IO per 50000 instructions • How have these rules changed over time?

 • Methodology – Rely on well-regarded benchmarks TPC-C (random) and TPC-H (sequential) •

• Methodology – Rely on well-regarded benchmarks TPC-C (random) and TPC-H (sequential) • Revisions to Amhdal’s laws – Balanced system law: Measure instruction rate and IO rate on relevant workload – Memory law: MB to MIPS ratio rising from 1 to 4 • Re-iteration of the growth in RAM as disk IOs become expensive – IO law: Workload dependent • 50000 instructions per IO was geared toward random IO • Increased sequentiality (discussed earlier) in disk accesses means higher instructions per IO

Gilder’s Law • Network bandwidth would triple every year for the next 25 years

Gilder’s Law • Network bandwidth would triple every year for the next 25 years (prediction in 1995) • Link bandwidth triples every four years • Network messages used to cost more instructions and IO instructions per byte than disk – Network protocol processing overheads – These overheads have been reduced due to smarter NICs • Cost comparison – Cost of moving data over WAN much more expensive than from local disk over LAN • Related: Cost of shipping large disk arrays or entire computers comparable to the cost of data transfer over the Internet – However, this price gap likely to decline soon and bandwidth would be plentiful within a decade • Implication: Local disks could then be used as caches (or prefetch buffers) with the main data store being remote – Save on local storage management costs – Managed data center model - is already seen!

Caching • 5 minute rule for random workloads • 1 minute rule for sequential

Caching • 5 minute rule for random workloads • 1 minute rule for sequential worloads • Web caches – Cache everything!