Geographically Distributed across 3 Regions Thousands of servicesapplications
Geographically Distributed across 3 Regions Thousands of services/applications Anywhere at Anytime Access to your data Durability and Scalability 70 Petabytes raw storage today Grows to >200 Petabytes by start of 2012
Telemetry for Kinect Game Saves in Cloud Facebook and Twitter Microsoft Zune Media Storage and Delivery Near Real-Time Search
BING REALTIME FACEBOOK/TWITTER SEARCH INGESTION ENGINE Bing Ingestion Engine (Azure Service) Index Facebook/Twitter data within 15 seconds of update User postings Status updates ………… VM VM Windows Azure Blobs Windows Azure Tables peak 40, 000 Requests/sec 2~3 billion Requests per day Took 1 dev 2 months to design, build and release to production
What’s new for Blobs, Tables and Queues
• • Blobs Tables Queues Drives
Container Blobs https: //<account>. blob. core. windows. net/<container> Account Table Entities https: //<account>. table. core. windows. net/<table> Queue Messages https: //<account>. queue. core. windows. net/<queue>
– What is new? • • • Range requests of the form “Range: bytes 100 -” Return “Accept-Ranges” response header ETags to be quoted
– What is new? • Query Projection ($select) • Project only selected columns • Upsert Entity • Insert. Or. Replace • Insert. Or. Merge
Projection public class Customer { public string Partition. Key { get; set; } // Customer Name public string Row. Key { get; set; } // Customer Phone Number public Date. Time Customer. Since { get; set; } public double Total. Purchase { get; set; } public string State { get; set; } // 100 more properties including profile picture etc. … } // Partial entity defined here public class Customer. Discount { public string Partition. Key { get; set; } public string Row. Key { get; set; } public double Total. Purchase { get; set; } }
Projection // Select partial entities by choosing properties to be projected var from in Customer. Discount "Customers" /*Table Name*/ select new Customer. Discount Partition. Key Row. Key Total. Purchase Customer. Discount // Calculate the discount to be given based on total purchases made
Upsert // When user logs in from mobile device, it will register the user using upsert Customer new Customer "Thomas Anderson" “ 555 -0100" "4567 Main St. Redmond 48188" "Washington" // Note: Attach. To method is called without an Etag which indicates // that this is an Upsert Command "Customers"/*Table Name*/ // // No Save. Change. Options indicates that a MERGE verb will be used to get Insert. Or. Merge semantics Use Save. Changes. Options. Replace. On. Update for Insert. Or. Replace semantics. But Insert. Or. Replace will overwrite Total. Purchase if it existed Save. Changes. Options. Replace. On. Update context. Save. Changes();
– What is new?
UPDATE MESSAGE EXAMPLE 7: 04 7: 00 7: 09 7: 07 AM AM AM Periodically store progress information in message content Extend visibility timeout with another 5 minutes Get Message with 5 minutes visibility timeout Expires @ @ 7: 05 AM 7: 09 AM Work items Azure Queue 7: 09 7: 05 7: 14 Retrieve progress from queue message and resume
Windows Azure Storage Analytics
Log Version Accessing Account Owner Account Service Type Request URL Object Key Request ID Operation Number Request Version Operation Type Start Time Application End to End Latency Storage Server Latency Authentication Type Request Status HTTP Status Code Client IP User Agent Referrer Client Request ID ETag LMT Request Packet Size Request Header Size Response Packet Size Response Header Size Request MD 5 Server MD 5 Conditions Used
Log Version: 1. 0 Log Entry in Blob: 1. 0; 2011 -07 Start Time: 2011 -07 -28 T 18: 02: 40. 6271789 Z; Put. Blob; Success; 201; 28; 21; authenticated; sally; s Operation Type: Put. Blob ally; blob; "http: //sally. blob. core. windows. net/thumbnails/lake. jpg? tim Status: Success eout=30000"; "/sally/thumbnails/lake. jpg"; fb 658 ee 6 -6123 -41 f 5 -81 e 2 HTTP Status Code: 201 4 bfdc 178 fea 3; 0; 201. 9. 10. 20; 2009 -09 Application E 2 E Latency (milliseconds): 28 19; 438; 100; 223; 0; 100; ; "66 Cb. MXKirx. De. Tr 82 SXBKbg=="; "0 x 8 CE 1 B 67 AD Storage Server Latency (milliseconds): 21 25 AA 05"; Thursday, 28 -Jul-11 18: 02: 40 GMT; ; "req 12345“ Accessing Account: sally Owner Account: sally Service Type: blob Request URL: PUT http: //sally. blob. core. windows. net/thumbnails/lake. jpg Object Key: /sally/thumbnails/lake. jpg Request ID: fb 658 ee 6 -6123 -41 f 5 -81 e 2 -4 bfdc 178 fea 3 Operation Number: 0 Request Version: 2009 -09 -19 Client IP: 201. 9. 10. 20 Client Request ID: req 12345
• • • Total Transactions Availability % Success, % Network Errors, % Timeout, % Throttled, etc. Average Latency (Application E 2 E and Storage Server latency) Total Ingress Total Egress • Capacity and # of objects
Application E 2 E Latency Request arrives at storage service Storage Server Latency Done
1400 8/23/2011 10: 00 8/23/2011 12: 00 8/23/2011 14: 00 8/23/2011 16: 00 8/23/2011 18: 00 8/23/2011 20: 00 8/23/2011 22: 00 8/24/2011 0: 00 8/24/2011 2: 00 8/24/2011 4: 00 8/24/2011 6: 00 8/24/2011 8: 00 8/24/2011 10: 00 8/24/2011 12: 00 8/24/2011 14: 00 8/24/2011 16: 00 8/24/2011 18: 00 8/24/2011 20: 00 8/24/2011 22: 00 8/25/2011 0: 00 8/25/2011 2: 00 8/25/2011 4: 00 8/25/2011 6: 00 8/25/2011 8: 00 8/25/2011 10: 00 8/25/2011 12: 00 8/25/2011 14: 00 8/25/2011 16: 00 8/25/2011 18: 00 8/25/2011 20: 00 8/25/2011 22: 00 8/26/2011 0: 00 8/26/2011 2: 00 8/26/2011 4: 00 8/26/2011 6: 00 Avg. Application E 2 E Latency (ms) Avg. Storage Server Latency (ms) 1200 1000 800 600 400 200 0
0 8/26/2011 6: . . . 8/26/2011 4: . . . 8/26/2011 2: . . . 8/26/2011 0: . . . 8/25/2011 2. . . 8/25/2011 18. . . 8/25/2011 16. . . 8/25/2011 14. . . 8/25/2011 12. . . 8/25/2011 10. . . 8/25/2011 8: . . . 8/25/2011 6: . . . 8/25/2011 4: . . . 8/25/2011 2: . . . 8/25/2011 0: . . . 8/24/2011 2. . . 8/24/2011 18. . . 8/24/2011 16. . . 8/24/2011 12. . . 8/24/2011 10. . . 10000000 8/24/2011 8: . . . 8/24/2011 6: . . . 8/24/2011 4: . . . 8/24/2011 2: . . . 8/24/2011 0: . . . 8/23/2011 2. . . 8/23/2011 18. . . 8/23/2011 16. . . 8/23/2011 14. . . 8/23/2011 12. . . 8/23/2011 10. . . 8/23/2011 1. . . 8/23/2011 2. . . 8/24/2011 0: . . . 8/24/2011 2: . . . 8/24/2011 4: . . . 8/24/2011 6: . . . 8/24/2011 8: . . . 8/24/2011 1. . . 8/24/2011 2. . . 8/25/2011 0: . . . 8/25/2011 2: . . . 8/25/2011 4: . . . 8/25/2011 6: . . . 8/25/2011 8: . . . 8/25/2011 1. . . 8/25/2011 2. . . 8/26/2011 0: . . . 8/26/2011 2: . . . 8/26/2011 4: . . . 8/26/2011 6: . . . 1400 1200 1000 800 600 400 200 0 Avg. Application E 2 E Latency (ms) Total Table Transactions 8000000 6000000 4000000 2000000
• http: //account. blob. core. windows. net/$logs/ • http: //account. table. core. windows. net/$Metrics*
North Central US North Europe Geo-replication East Asia South East Asia Geo-replication Europe West Geo-replication South Central US
Microsoft Windows Azure Support
http: //account. blob. core. windows. net/ Azure DNS Hostname IP Address account. blob. core. windows. net North Central South Central. US US Update DNS lookup Data access North Central US Failover Geo-replication South Central US
Windows Azure Storage Internals
Design Goals • “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”
Access blob storage via the URL: http: //<account>. blob. core. windows. net/ Storage Location Service Data access LB LB Front-Ends Partition Layer DFS Layer Intra-stamp replication Storage Stamp Inter-stamp (Geo) replication DFS Layer Intra-stamp replicaion Storage Stamp
• • • All data from the Partition Layer is stored into files (extents) in the DFS layer An extent is replicated 3 times across different fault and upgrade domains Checksum all stored data • • Verified on every client read Scrubbed every few days • 3 replicas are randomly allocated across a candidate set of servers based on available resources Any of the 3 replicas can be read from and read load balancing is used Use a journal drive to keep the write latencies low Re-replicate on disk/node/rack failure or checksum mismatch Load balancing • • Distributed File System (DFS) Layer M M Paxos M DFS Servers
• • Provide transaction semantics and strong consistency for high level data abstractions Stores and reads the objects to/from extents in the DFS layer Provides inter-stamp (geo) replication by shipping logs to other stamps Scalable object index via partitioning Partition Master Lock Service Partition Layer Partition Server M DFS Layer M Paxos M DFS Servers
• • • Front End Layer FE FE Stateless Servers Authentication + authorization Request routing FE FE FE Partition Master Lock Service Partition Layer Partition Server M DFS Layer M Paxos M DFS Servers
Incoming Write Request Ack Front End Layer FE FE FE Partition Master Lock Service Partition Layer Partition Server M DFS Layer M Paxos M DFS Servers
• Need a scalable index for the objects that can • Spread the index across 100 s of servers • Dynamically load balance • Dynamically change what servers are serving each part of the index based on load
Blob Index Account Name Container Name Blob Name aaaa aaaaa ……. . ……… ……. . Account Container harry pictures Name ……. . Front-End harry pictures ……. . Server ……. . ……… ……. . A-H: PS 1 ……… ……. . PS 2 Account H’-R: Container richard videos Name R’-Z: Name PS 3 ……. . richard videos ……. . Partition ……… ……. . Map……. . Blob sunrise Name ……. . sunset ……. . ……… ……. . Blob soccer Name ……. . tennis ……. . ……… ……. . zzzz zzzzz Storage Stamp PS 1 PS 2 A-H: PS 1 Partition H’-R: PS 2 Master R’-Z: PS 3 Partition Server A-H Partition Server H’-R Partition Map Partition Server R’-Z PS 3
VIP Legend - Range. Partition - Server Load FE 2 PM FE 1 PM Partition Server 1 Partition Server 2 DFS Layer FE 3 Partition Server 4
1. Scalability targets of a single storage account 2. Scalability targets for Blobs, Table Entities and Queues within a storage account
Scalability targets of a single storage account Account Scalability Targets • • • Capacity – Up to 100 TBs Transactions – Up to 5000 entities per second Bandwidth – Up to 3 gigabits per second Partition data across storage accounts to go beyond these targets
Scalability targets for Blobs, Table Entities and Queues within a storage account • • Single Blob – up to 60 MBytes per second Single Partition. Key in a Table – up to 500 entities per second Single Queue - up to 500 messages per second
• “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency” http: //blogs. msdn. com/windowsazurestorage/
http: //forums. dev. windows. com http: //bldw. in/Session. Feedback
- Slides: 54