Hosted by The Pros Cons of Content Addressed
Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst
Hosted by Current Data Protection Environment l Data Tsunami l No Backup Windows Something Must Be Done! l Cost of Downtime Increasing l Regulations and Compliance Requirements l Data Protection Technology at Break Point
Hosted by Many New Technologies to The Rescue i. SCSI NDMP i. FCP FCIP DAFS RDMA CAS GRID RAIN TOE SATA SMI-S SAS
Hosted by What is CAS? Definition Concept whereby the address of an object is computed from the content of that object Advantages Disadvantages l Location Independence l New and Unfamiliar l Authenticity l May Require Changes to Applications l Simplified Indexing l Scalability to Exabytes l Load Balancing l Elimination of Duplication l May Require Procedural Changes l May Require Abandoning Existing Applications
Hosted by CAS vs Networked Storage l SAN & NAS Use File Systems to Place and Locate Data (/abc/xyz/acme. doc) l Hierarchical l Difficult to Scale Beyond TBs l Application Determines if Duplication of Object Exists l Indexing can Become Complicated
Hosted by How is CAS Done? l Algorithm Applied to the Object’s Content • • • Object (File, FS, Dir) File Portion of a file Directory or file system l Unique 128 -bit Coding Results (160 -bits for Avamar) 128 -bit hash unique to that object (eg. MD 5)
Hosted by What Can CAS Be Used For? l Archival Storage l Backup and Restore l Disaster Recovery l Content Management
Hosted by Issues with Existing Architectures Archive/Content Mgmt Backup and Restore/DR l Lack of Authenticity l Application Performance l Media/Technology Changes l Generates Tons of Data 10: 1 l Tape Environmental Issues l Backup Windows l Poor Access Times l No Guarantee if Data is Recoverable l TCO Expensive l Slow Queries from Large Reps l Centralized Indexing l DR Expensive l DR: Potential Consistency Issues
Hosted by Methods for Keeping More Data Online l Bigger Primary Storage l Compression of Data l Hierarchical Storage Architectures l Data Normalization: Finding Subsets of Data That are Common and Storing Them Only Once • • No Limit on the Effective Compression Ratio Indexing Systems Super Critical
Hosted by Commonality Factoring Using CAS l Fixed Size Atomics for Database l Variable Size Atomics for File Systems l CAS Algorithms Used to Calculate CA for Each Subset l Data Structures Needed to Reconstruct from Atomics l Above Data Kept with Atomics Data
Hosted by CAS Example: Avamar l CAS Applied to BU/Restore, Archive and DR (initial focus BU/R) l Focus on Data Reduction l Typical Secondary to Primary Ratio is 10: 1 l Avamar Claims 1. 2 to 1 l Never Do Full + Incremental Backups, Only Snap. Ups
Hosted by CAS Example: Avamar Systems Architecture l Distributed Backup Repository l Peer-to-Peer RAIN Architecture l Each Node has Uniform and Consistent View of Repository l Clients can Request Services from any Node l Data Striped Across Nodes (similar to RAID) l No Single Point of Failure l Requires Agent on Each Client System
Hosted by CAS Archival Example: EMC Centera CA of CDF Returned Application Centera CDF CA of CDF XML metadata file API CA Calculate CA and extract metadata CDF store C-clip Blob store Source: EMC
Hosted by CAS Advantages: EMC Centera Due to CAS l No LUNs to Create or Manage l No Volumes to Create or Manage l Flat Addressing, Simple Indexing l Content Authentication l One Copy of Blob Stored Due to Architecture l RAIN=Non-disruptive Scalability l No Reconfigs Required l No Technology Obsolescence l Policy-based Storage of Blobs l Application Modification
Hosted by CAS Players Data Center Technologies Persist Technologies
Hosted by CAS Futures: What's Needed? l Flexible Scaling Capabilities l Integration with File Interfaces l Easy API-free Application Integration l Integrated Indexing
Hosted by Summary CAS +’s CAS -’s l Location Independence l Many Aspects are Untested l Authenticity l Eliminate Redundancy l May Require New Procedures/Tools l Simplify Indexing l Disruptive Technology l Simplify Management l Improve Scalability l Single System Image of Repository l Not Good Enough for High Performance Primary Needs
Hosted by Taneja Group Recommendations l Absolutely Test Out CAS Systems but… l Apply to a Project at a Time (consider the disruptive factor) No Wholesale Changes! l Keep a Fallback Position (run systems in parallel) l Test Out Recoverability Regularly l Keep in Mind…More Solutions Coming
- Slides: 18