Storage Growth is Exponential Unlike compute and network


- Slides: 2
Storage Growth is Exponential • Unlike compute and network resources, storage resources are not reusable • • Unless data is explicitly removed Need to use storage wisely Checkpointing, etc. Time consuming, tedious tasks Storage Growth 1998 -2006 at ORNL (rate: 2 X / year) • Data growth will scale with compute scaling • Storage will grow even with good practices (such as eliminating unnecessary replicas) • Likely to be faster than historical growth • Not necessarily on supercomputers • but, on user/group machines • and archival storage Storage Growth 1998 -2006 at NERSC (rate: 1. 7 X / year) • Storage cost is a consideration A. Shoshani • Has to be part of science growth cost • But, storage costs going down at a rate similar to data growth The challenges are in • Need continued investment in new managing the data Feb. 2007
Data and Storage Challenges End-to-End: 3 Phases of Scientific Investigation) • Data production phase • Data movement • I/O to parallel file system • Moving data out of supercomputer storage • Sustain data rates of GB/sec • Observe data during production • Automatic generation of metadata • on-the-fly data processing • computations for visualization / monitoring • Data extraction / analysis phase • Automate data distribution / replication • Synchronize replicated data • Data lifetime management to unclog storage • Extract subsets efficiently • Post-processing phase • Avoid reading unnecessary data • Large-scale (entire datasets) data • Efficient indexes for “fixed content” processing data • Summarization / statistical • Automated use of metadata properties • Parallel analysis tools • Reorganization / transposition General Data Challenges • Statistical analysis tools • Generate data at different granularity • Multiple parallel file systems • Running coupled codes • Data mining tools • A common data model • Coordinated data movement (not just files) • Coordinated scheduling of resources • Reservations and workflow management A. Shoshani • Data Reliability / monitoring / recovery Feb. 2007 • Tracking data for long running jobs