The Case for Versatile Storage System Samer AlKiswany

  • Slides: 19
Download presentation
The Case for Versatile Storage System Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu Net. Sys.

The Case for Versatile Storage System Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu Net. Sys. Lab The University of British Columbia

Introduction Versatile Storage System for large-scale platforms: • Underutilized resources • Application specialization The

Introduction Versatile Storage System for large-scale platforms: • Underutilized resources • Application specialization The Deployment Approach: • Configured at deployment time • Coupled with the target application Potential: Higher performance and scalability Hot. Storage ‘ 09

Platform Example – Argonne Blue Gene/P 2. 5 K IO Nodes GPFS 160 K

Platform Example – Argonne Blue Gene/P 2. 5 K IO Nodes GPFS 160 K cores IO rate : 8 GBps = 51 KBps / core !! Torus Network 10 Gb/s Switch Complex 24 servers Under utilized resources. Hot. Storage ‘ 09 2. 5 GBps 850 MBps per node per 64 nodes 3 D Torus Tree

Workload Characteristics Workflows – Execution stages communicating through intermediate temporary files Input file Compute

Workload Characteristics Workflows – Execution stages communicating through intermediate temporary files Input file Compute Output file Source [Zhao et. al. SIGMOD record ‘ 05] Hot. Storage ‘ 09

Workload Characteristics Workflows – Execution stages communicating through intermediate temporary files Tibi Stef-Praun, et.

Workload Characteristics Workflows – Execution stages communicating through intermediate temporary files Tibi Stef-Praun, et. al. [e-Social Science ‘ 07] Hot. Storage ‘ 09

Workload Characteristics Workflows – Execution stages communicating through intermediate temporary files Axes Optimizations Data

Workload Characteristics Workflows – Execution stages communicating through intermediate temporary files Axes Optimizations Data life time (temporary ) Read (Seq. ) Application informed caching Read-ahead Write (Seq. ) Consistency (no ) Asynch. write Workflows Hot. Storage ‘ 09 Relaxed Consistency

Workload Characteristics Data Analysis – Analyze/search large data sets (e. g. BLAST) Axes Optimizations

Workload Characteristics Data Analysis – Analyze/search large data sets (e. g. BLAST) Axes Optimizations Data life time Application BLAST (temporary ) informed caching Match new sequences with Read (Seq. ) Read-ahead a data set of Asynch. write Write (Seq. ) known Consistency Relaxed sequences (no ) Consistency (linear search) Locality Caching Workflows – Data Analysis Hot. Storage ‘ 09

Workload Characteristics Checkpointing Workflows Data Analysis Checkpointing Axes Data life time (temporary ) Read

Workload Characteristics Checkpointing Workflows Data Analysis Checkpointing Axes Data life time (temporary ) Read (Seq. ) Write (Seq. ) Consistency (no ) Locality Compressibility Hot. Storage ‘ 09 Optimizations Application informed caching Read-ahead Asynch. write Relaxed Consistency Caching Similarity detection

Workload Characteristics Axes Data life time (temporary ) Read (Seq. ) Write (Seq. )

Workload Characteristics Axes Data life time (temporary ) Read (Seq. ) Write (Seq. ) Consistency (no ) Locality Compressibility Workflows Data Analysis Security Checkpointing Hot. Storage ‘ 09 Optimizations Application informed caching Read-ahead Asynch. write Relaxed Consistency Caching Similarity detection Tunable sec. levels

Opportunities Ø Specialization: Application specialized storage Ø Under utilized resources § Compute node storage

Opportunities Ø Specialization: Application specialized storage Ø Under utilized resources § Compute node storage space § Interconnect bandwidth Hot. Storage ‘ 09

Our Solution Versatile Storage System: Application specialized The Deployment Approach: • Configured at deployment

Our Solution Versatile Storage System: Application specialized The Deployment Approach: • Configured at deployment time • Life time coupled with the target application Potential : Higher performance and scalability Hot. Storage ‘ 09

Versatile Storage System Architecture Manager (Metadata management) Compute Node Access Module Storage Node Hot.

Versatile Storage System Architecture Manager (Metadata management) Compute Node Access Module Storage Node Hot. Storage ‘ 09

Configurable / Extensible IO Pipeline Access Module Storage Node Application Metadata Operations IO Buffer

Configurable / Extensible IO Pipeline Access Module Storage Node Application Metadata Operations IO Buffer Manag. Application IO Queue Dispatcher Metadata Operations Consistency Buffer Manag. Hot. Storage ‘ 09 Content Addressability Data Security … Communication Agent Queue Dispatcher

Configurable / Extensible IO Pipeline Access Module Storage Node Application Metadata Operations IO Dispatcher

Configurable / Extensible IO Pipeline Access Module Storage Node Application Metadata Operations IO Dispatcher Buffer Manag. Queue Consistency Dispatcher Consistency Content Addressability Hot. Storage ‘ 09 Content Addressability Data Security … Communication Agent

Configurable / Extensible Support Access Module Storage Node Access Module Application Metadata Operations IO

Configurable / Extensible Support Access Module Storage Node Access Module Application Metadata Operations IO Buffer Manag. Queue NM Dispatcher Communication Agent Manager Header Request data … Request Dispatcher New Module Support … Metadata Service API Hot. Storage ‘ 09

Preliminary Evaluation – Real Application DOCK 6 workflow: Stages Versatile Storage Optimizations Read input,

Preliminary Evaluation – Real Application DOCK 6 workflow: Stages Versatile Storage Optimizations Read input, compute, and write temporary results Summarize, sort, and select Archive Cache the input data Cache temporary files Asynch. flush results to GPFS Results (8 K processors) 1. 06 x 11. 76 x 1. 51 x Overall: 1. 52 x Hot. Storage ‘ 09

Summary Versatile Storage System • Underutilized resources • Application specialization The Deployment Approach: •

Summary Versatile Storage System • Underutilized resources • Application specialization The Deployment Approach: • Configured at deployment time • Coupled with the target application Potential: Higher performance and scalability Hot. Storage ‘ 09

Not addressed – Future work Ø Configurability / extensibility evaluation § Complete prototype §

Not addressed – Future work Ø Configurability / extensibility evaluation § Complete prototype § Evaluation with a diverse set of applications Ø Configuration § Application profiling § File system automated configuration Hot. Storage ‘ 09

Thank you netsyslab. ece. ubc. ca

Thank you netsyslab. ece. ubc. ca