PNUTS Yahoos Hosted Data Serving Platform n n
- Slides: 32
PNUTS: Yahoo!’s Hosted Data Serving Platform n n Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research
How do I build a cool new web app? n Option 1: Code it up! Make it live! n n Scale it later It gets posted to slashdot Scale it now! Flickr, Twitter, My. Space, Facebook, … 2
How do I build a cool new web app? n Option 2: Make it industrial strength! n Evaluate scalable database backends Evaluate scalable indexing systems Evaluate scalable caching systems Architect data partitioning schemes Architect data replication schemes Architect monitoring and reporting infrastructure n Write application n n n n Go live Realize it doesn’t scale as well as you hoped Rearchitect around bottlenecks 1 year later – ready to go! 3
Example: social network updates Brian Sonja Jimi Brandon Kurt What are my friends up to? Sonja: Brandon: 4
Example: social network updates <photo> <title>Flower</title> <url>www. flickr. com</url> </photo> 6 8 12 15 16 17 Jimi Mary Sonja Brandon Mike Bob <ph. . <re. . <ph. . <po. . <ph. . <re. . 5
What do we need from our DBMS? n Web applications need: n Scalability n n And the ability to scale linearly Geographic scope High availability Web applications typically have: n Simplified query needs n n No joins, aggregations Relaxed consistency needs n Applications can tolerate stale or reordered data 6
What is PNUTS? 7
What is PNUTS? A B 42342 42521 E W C 66354 W D E 12352 75656 E C F 15677 E Parallel database A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E Indexes and views CREATE TABLE Parts ( ID VARCHAR, Stock. Number INT, Status VARCHAR A 42342 E … B 42521 W ) C 66354 W D E F 12352 75656 15677 Geographic replication Structured, flexible schema E C E Hosted, managed infrastructure 8
Query model n Per-record operations n n Multi-record operations n n Get Set Delete Multiget Scan Getrange Web service (RESTful) API 9
Detailed architecture Clients Data-path components REST API Routers Tablet controller Message Broker Storage units 10
Detailed architecture Local region Remote regions Clients REST API Routers YMB Tablet controller Storage units 11
Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Storage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 12
Query processing 13
Range queries Apple Avocado Grapefruit…Pear? Banana Blueberry Canteloupe Grape Kiwi Lemon MIN-Canteloupe SU 1 Canteloupe-Lime SU 3 Lime-Strawberry SU 2 Strawberry-MAX SU 1 Router Grapefruit…Lime? Lime…Pear? Lime Mango Orange Strawberry Tomato Watermelon Storage unit 1 Storage unit 2 Storage unit 3 16
Updates 1 8 Sequence # for key k Write key k Routers Message brokers 3 7 Sequence # for key k 2 Write key k 4 Write key k 5 SU SU SU 6 SUCCESS Write key k 17
Asynchronous replication and consistency 18
Asynchronous replication 19
Consistency model n Goal: make it easier for applications to reason about updates and cope with asynchrony n What happens to a record with primary key “Brian”? Record Update inserted v. 1 v. 2 Update v. 3 Update v. 4 Update Delete Update v. 5 v. 6 Generation 1 v. 7 v. 8 Time 20
Consistency model Read Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time 21
Consistency model Read up-to-date Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time 22
Consistency model Read ≥ v. 6 Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time 23
Consistency model Write Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time 24
Consistency model Write if = v. 7 ERROR Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time 25
Consistency model Write if = v. 7 ERROR Stale version Current version Mechanism: per record mastership v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 Generation 1 v. 7 v. 8 Time 26
Experiments 27
Experimental setup n Production PNUTS code n n Three PNUTS regions n n n Enhanced with ordered table type 2 west coast, 1 east coast 5 storage units, 2 message brokers, 1 router West: Dual 2. 8 GHz Xeon, 4 GB RAM, 6 disk RAID 5 array East: Quad 2. 13 GHz Xeon, 4 GB RAM, 1 SATA disk Workload n n n 1200 -3600 requests/second 0 -50% writes 80% locality 28
Scalability 29
Request skew 30
Size of range scans 31
Related work n Distributed and parallel databases n n n Distributed filesystems n n Ceph, Boxwood, Sinfonia Distributed (P 2 P) hash tables n n Especially query processing and transactions Big. Table, Dynamo, S 3, Simple. DB, SQL Server Data Services, Cassandra Chord, Pastry, … Database replication n Master-slave, epidemic/gossip, synchronous… 32
Conclusions and ongoing work n PNUTS is an interesting research product n n n Research: consistency, performance, fault tolerance, rich functionality Product: make it work, keep it (relatively) simple, learn from experience and real applications Ongoing work n n n Indexes and materialized views Bundled updates Batch query processing 33
Thanks! n n cooperb@yahoo-inc. com research. yahoo. com 34
- Data serving platform
- Hosted data warehouse
- Self hosted wiki
- Microsoft hosted crm
- New-omeconfiguration
- Hosted uc
- Hosted collaboration services
- Hosted microsoft lync
- Hosted bi
- Branchcache hosted cache server
- Unified communications presentation
- Hosted lync voice
- Hosted telefonie betekenis
- Hosted taxwise
- Hosted lync
- Codap.concord.org
- Flow-rp role
- Linked data platform
- Hybrid data platform
- Personal data exchange platform
- Spatial big data platform
- Infosphere virtual data pipeline
- Santiert
- Cohesion open data platform
- Azure data platform
- Data distribution platform
- Rachel drysdale
- Power platform data modelling
- Wine label
- What is serving
- Mr biv ritz carlton
- Joy of serving god
- Self serving bias definition psychology