f 4 Facebooks Warm BLOB Storage System Subramanian
f 4: Facebook’s Warm BLOB Storage System Subramanian Muralidhar*, Wyatt Lloyd*ᵠ, Sabyasachi Roy*, Cory Hill*, Ernest Lin*, Weiwen Liu*, Satadru Pan*, Shiva Shankar*, Viswanath Sivakumar*, Linpeng Tang*⁺, Sanjeev Kumar* *Facebook Inc. , ᵠUniversity of Southern California, ⁺Princeton University 1
BLOBs@FB Cover Photo Profile Photo Immutable & Unstructured Feed Photo Diverse A LOT of them!! Feed Video 2
510 X HOT DATA Normalized Read Rates 590 X WARM DATA Photo Video Data cools off rapidly 98 X 68 X 30 X < 1 Days 1 Day 16 X 1 Week 14 X 6 X 1 Month 7 X 2 X 3 Months 1 X 1 X 1 Year 3
9 DC Host Diskfailures 3 Rack failures Handling failures DATACENTER RACKS HOST 1. 2 Replication: * 3 = 3. 6 4
Handling load HOST 6 Reduce space usage AND Not compromise reliability 5
Background: Data serving ▪ User Requests CDN protects storage Writes ▪ Router abstracts storage ▪ Web tier adds business logic Web Servers Reads CDN Router BLOB Storage 6
Background: Haystack [OSDI 2010] ▪ Volume is a series of BLOBs Header BID 1: Off BLOB 1 BID 2: Off Footer ▪ In-memory index Header BLOB 1 Footer BIDN: Off In-Memory Index Header BLOB 1 Footer Volume 7
Introducing f 4: Haystack on cells Rack Data+Index Cell Compute 8
Data splitting 10 G Volume Reed Solomon Encoding 4 G parity Stripe 2 Stripe 1 RS BLOB 1 BLOB 2 BLOB 4 LOB 5 B BLOB 5 BLOB 4 BLOB 2 BLOB 8 BLOB 10 BLOB 11 BLOB 7 BLOB 3 RS BLOB 6 BLOB 9 => 9
Data placement 10 G Volume 4 G parity Stripe 2 Stripe 1 RS RS Cell with 7 Racks ▪ Reed Solomon (10, 4) is used in practice (1. 4 X) ▪ Tolerates 4 racks ( 4 disk/host ) failures 10
Reads Router User Request Index Storage Nodes Read Data Read Compute Cell ▪ 2 -phase: Index read returns the exact physical location of the BLOB 11
Reads under cell-local failures Router Index User Index Storage Nodes Read Request Data Read Decode Read ▪ Compute (Decoders) Cell-Local failures (disks/hosts/racks) handled locally 12
Reads under datacenter failures (2. 8 X) Router User Request Compute (Decoders) Proxying Cell in Datacenter 1 Compute (Decoders) Mirror Cell in Datacenter 2 2 * 1. 4 X = 2. 8 X 13
Cross datacenter XOR (1. 5 * 1. 4 = 2. 1 X) 67% Index Cell in Datacenter 1 33% Index Cell in Datacenter 2 Index Cross –DC index copy Cell in Datacenter 3 14
Reads with datacenter failures (2. 1 X) Router Index Data Read User Router Request XO R Index Read Data Read Index Router 15
Haystack v/s f 4 2. 8 v/s f 4 2. 1 Haystack with 3 copies f 4 2. 8 f 4 2. 1 3. 6 X 2. 8 X 2. 1 X Irrecoverable Disk Failures 9 10 10 Irrecoverable Host Failures 3 10 10 Irrecoverable Rack failures 3 10 10 Irrecoverable Datacenter failures Load split 3 2 2 3 X 2 X 1 X Replication 16
Evaluation ▪ What and how much data is “warm”? ▪ Can f 4 satisfy throughput and latency requirements? ▪ How much space does f 4 save ▪ f 4 failure resilience 17
Methodology ▪ CDN data: 1 day, 0. 5% sampling ▪ BLOB store data: 2 week, 0. 1% ▪ Random distribution of BLOBs assumed ▪ The worst case rates reported 18
Hot and warm divide 400 HOT DATA Reads/Sec per disk 350 300 WARM DATA < 3 months Haystack > 3 months f 4 250 200 Photo 150 80 Reads/Sec 100 50 0 1 week 1 month 3 month 1 year Age 19
It is warm, not cold Haystack (50%) HOT DATA F 4 (50%) WARM DATA 20
f 4 Performance: Most loaded disk in cluster Reads/Se c Peak load on disk: 35 Reads/Sec 21
f 4 Performance: Latency P 80 = 30 ms P 99 = 80 ms 22
Concluding Remarks ▪ Facebook’s BLOB storage is big and growing ▪ BLOBs cool down with age ▪ ~100 X drop in read requests in 60 days ▪ Haystack’s 3. 6 X replication over provisioning for old, warm data. ▪ f 4 encodes data to lower replication to 2. 1 X 23
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc. . All rights reserved. 1. 0
- Slides: 25