No SQL and Dynamo DB Rick Houlihan Principal
No. SQL and Dynamo. DB Rick Houlihan Principal Solutions Architect Amazon Web Services October 2015 WWW. AWSEDUCATE. COM
What to expect from the session • Brief history of data processing • Introduction to No. SQL • Dynamo. DB Internals • Tables, API, data types, indexes • Scaling and data modeling • Design patterns and best practices • Event driven applications and DDB Streams • Reference architecture
What is a Database? “A structured set of data held in a computer, especially one that is accessible in various ways. ” - Google “A database is an organized mechanism for storing, managing and retrieving information. ” – About. com “A place to put stuff my app needs. ” – Average Developer
Fi le S s Q L N o. S BM R D te m s ru m D Sy s a D at or ds t. R ec U ni rs Le dg e Data Pressure Timeline of Database Technology
Data Volume Since 2010 • 90% of stored data generated in last 2 years • 1 Terabyte of data in 2010 equals 6. 5 Petabytes today • Linear correlation between data pressure and technical innovation • No reason these trends will not continue over time
Why No. SQL? SQL No. SQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale
The Iron Triangle of Data – All About CAP Availability: All clients can always read and write Data Models: Relational Column Oriented Document Key/Value A CA MSSQL Oracle DB 2 Postgres My. SQL AP Aster Data Greenplum Vertica Voldemort Cassandra Tokyo Cabinet Simple. DB KAI Couch. DB Riak Pick Two C Consistency: All clients always have the same view of data P Partition Tolerance: CP Big Table Hypertable Hbase Mongo. DB Terastore Couchbase Scalaris Dynamo. DB Berkeley. DB Memcache Redis The system works well despite physical network partitions
Partition Management for AP Systems
SQL vs. No. SQL Access Pattern
Technology Adoption and the Hype Curve
Amazon Dynamo. DB Fully Managed No. SQL Fast and Consistent Document or Key-Value Access Control Scales to Any Workload Event Driven Programming
Tables, API, Data Types
Table and item API Admin CRUD Create Table Put/Get Item Update Table Batch Put/Get Item Delete Table Update Item Describe Table Delete Item Query Scan Dynamo. DB Streams API List. Streams Describe. Stream Get. Shard. Iterator Get. Records
Data types String (S) Number (N) Binary (B) String Set (SS) Number Set (NS) Binary Set (BS) Boolean (BOOL) Null (NULL) List (L) Map (M) Used for storing nested JSON documents
Table Items Attributes Mandatory Key-value access pattern Determines data distribution Hash Range Key Optional Model 1: N relationships Enables rich query capabilities All items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values paged responses
Hash table Hash key uniquely identifies an item Hash key is used for building an unordered hash index Table can be partitioned for scale 0000 Id = 1 Name = Jim Id = 2 Name = Andy Dept = Eng Id = 3 Name = Kim Dept = Ops Hash (1) = 7 B Hash (2) = 48 Hash (3) = CD 54 55 Key Space A 9 AA FF
Hash-range table Hash key and range key together uniquely identify an Item Within unordered hash index, data is sorted by the range key No limit on the number of items (∞) per hash key • Except if you have local secondary indexes Partition 2 Partition 1 00: 0 54: ∞ Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Hash (2) = 48 Partition 3 A 9: ∞ 55 Customer# = 1 Order# = 10 Item = Toy Customer# = 1 Order# = 11 Item = Boots Hash (1) = 7 B AA FF: ∞ Customer# = 3 Order# = 10 Item = Book Customer# = 3 Order# = 11 Item = Paper Hash (3) = CD
Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 1 Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 2 Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 3 Partition 1 Partition 2 Partition N
Indexes
Local secondary index (LSI) Alternate range key attribute Index is local to a hash key (or partition) Table A 1 A 3 A 2 (hash) (range) (table key) LSIs 10 GB max per hash key, i. e. LSIs limit the # of range keys! A 1 A 2 A 3 A 4 A 5 (hash) (range) KEYS_ONLY A 1 A 4 A 2 A 3 INCLUDE A 3 (hash) (range) (table key) (projected) A 1 A 5 A 2 A 3 A 4 ALL (hash) (range) (table key) (projected)
Global secondary index (GSI) Online indexing Alternate hash (+range) key Index is across all table hash keys (partitions) Table A 2 A 1 (hash) (table key) GSIs RCUs/WCUs provisioned separately for GSIs A 1 A 2 A 3 A 4 A 5 (hash) KEYS_ONLY A 5 A 4 A 1 A 3 (hash) (range) (table key) (projected) INCLUDE A 3 A 4 A 5 A 1 A 2 A 3 (hash) (range) (table key) (projected) ALL
How do GSI updates work? Client 1. Update request 2. Updat e respon Table se Primary table Global Primary table Secondary table 2. Asynchronous update (in progress) Index If GSIs don’t have enough write capacity, table writes will be throttled!
LSI or GSI? LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!
Scaling
Scaling Throughput • Provision any amount of throughput to a table Size • Add any number of items to a table • Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit Scaling is achieved through partitioning
Throughput Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent RCU WCU
What causes throttling? If sustained throughput goes beyond provisioned throughput per partition Non-uniform workloads • Hot keys/hot partitions • Very large bursts Mixing hot data with cold data • Use a table per time period Overly partitioned tables • If sustained throughput > partition limit, Dynamo. DB may throttle requests • Solution: Increase provisioned throughput or restructure data
Partition What bad No. SQL looks like… Heat Time
Getting the most out of Dynamo. DB throughput “To get the most out of Dynamo. DB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible. ” —Dynamo. DB Developer Guide Space: access is evenly spread over the key-space Time: requests arrive evenly spaced in time
Much better picture…
A global leader in retargeting More than 10, 000 active advertisers in >100 countries • • • Provisioned for over 1 M transactions per second 4 regions in use with live traffic replication 120 B+ key fetches worldwide per day (RTB) 1. 5 TB of data stored per region 30 B+ items stored in reach region <3 ms uniform query latency, <10 ms 99. 95% Dynamo. DB
• Simple video monitoring & security • Fast growth – “suddenly petabytes” • More inbound video than You. Tube Switch to Dynamo. DB Move to AWS cameras 2009 2010 2011 2012 2013
Dynamo. DB reduces delivery time for video events from 5 -10 secs to 50 milliseconds
Online Gaming “Dynamo. DB came along at just the right time, and Halfbrick switched to storing our game data in Dynamo. DB, which alleviated our scaling problems while also freeing us from the burden of managing all the underlying hardware and software. We love that Dynamo. DB handles so much of the management for us, freeing us to focus on development. ”
Data modeling
1: 1 relationships or key-values Use a table or GSI with a hash key Use Get. Item or Batch. Get. Item API Example: Given an SSN or license number, get attributes Users Table Hash key SSN = 123 -45 -6789 SSN = 987 -65 -4321 Attributes Email = johndoe@nowhere. com, License = TDL 25478134 Email = maryfowler@somewhere. com, License = TDL 78309234 Users-Email-GSI Hash key Attributes License = TDL 78309234 Email = maryfowler@somewhere. com, SSN = 987 -65 -4321 License = TDL 25478134 Email = johndoe@nowhere. com, SSN = 123 -45 -6789
1: N relationships or parent-children Use a table or GSI with hash and range key Use Query API Example: • Given a device, find all readings between epoch X, Y Device-measurements Hash Key Range key Attributes Device. Id = 1 epoch = 5513 A 97 C Temperature = 30, pressure = 90 Device. Id = 1 epoch = 5513 A 9 DB Temperature = 30, pressure = 90
N: M relationships Use a table and GSI with hash and range key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key User. Id = bob Game. Id = Game 1 User. Id = fred Game. Id = Game 2 User. Id = bob Game. Id = Game 3 Game-Users-GSI Hash Key Range key Game. Id = Game 1 User. Id = bob Game. Id = Game 2 User. Id = fred Game. Id = Game 3 User. Id = bob
Documents (JSON) Java. Script Dynamo. DB string S New data types (M, L, BOOL, NULL) introduced to support JSON Document SDKs number N boolean BOOL null NULL array L object M • Simple programming model • Conversion to/from JSON • Java, Java. Script, Ruby, . NET Cannot index (S, N) elements of a JSON object stored in M • Only top-level table attributes can be used in LSIs and GSIs without Streams/Lambda
Rich expressions Projection expression • Query/Get/Scan: Product. Reviews. Five. Star[0] Filter expression • Query/Scan: #VIEWS > : num Conditional expression • Put/Update/Delete. Item: attribute_not_exists (#pr. Five. Star) Update expression • Update. Item: set Replies = Replies + : num
Scenarios and best practices
Event logging Storing time series data
Older tables RCUs = 10000 WCUs = 10000 Events_table_2015_March Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) RCUs = 1000 WCUs = 100 Events_table_2015_Feburary Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) RCUs = 100 WCUs = 1 Events_table_2015_January Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) RCUs = 10 WCUs = 1 Don’t mix hot and cold data; archive cold data to Amazon S 3 Cold data Current table Events_table_2015_April Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) Hot data Time series tables
Use a table per time period Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables Dealing with time series data
Product catalog Popular items (read)
Scaling bottlenecks SELECT Id, Description, . . . FROM Product. Catalog WHERE Id="POPULAR_PRODUCT" Shoppers c /se 0 0 0, 0 7 Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Product B Product A Product. Catalog Table Partition 50 2000 RCU
Requests Per Second Request Distribution Per Hash Key Item Primary Key Dynamo. DB Requests
Cache popular items SELECT Id, Description, . . . FROM Product. Catalog WHERE Id="POPULAR_PRODUCT" User Dynamo. DB Partition 1 Partition 2 Product. Catalog Table
Requests Per Second Request Distribution Per Hash Key Item Primary Key Dynamo. DB Requests Cache Hits
Messaging app Large items Filters vs. indexes M: N Modeling—inbox and outbox
David Messages App Inbox SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Outbox Messages Table SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
Large and small attributes mixed Inbox David Hash key Range key Messages Table Recipient Date Sender Message David 2014 -10 -02 Bob … … 48 more messages for David … David 2014 -10 -03 Alice … Alice 2014 -09 -28 Bob … Alice 2014 -10 -01 Carol … (Many more messages) SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC 50 items × 256 KB each Large message bodies Attachments
Computing inbox query cost 50 * 256 KB * (1 RCU / 4 KB) Average item size Items evaluated by query * (1 / 2) = 1600 RCU Eventually consistent reads Conversion ratio
Uniformly distributes large item reads Separate the bulk data (50 sequential items at 128 bytes) David 1. Query Inbox-GSI: 1 RCU 2. Batch. Get. Item Messages: 1600 RCU (50 separate items at 256 KB) Inbox-GSI Messages Table Recipient Date Sender Subject Msg. Id Body David 2014 -10 -02 Bob Hi!… afed 9 d 2 b … David 2014 -10 -03 Alice RE: The… 3 kf 8 … Alice 2014 -09 -28 Bob FW: Ok… 9 d 2 b ct 7 r … Alice 2014 -10 -01 Carol Hi!. . . ct 7 r afed …
Inbox GSI Define which attributes to copy into the index
Outbox GSI Outbox Sender SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
Messaging app David Inbox Outbox Inbox Global secondary index Outbox Global secondary index Messages Table
Distribute large items Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M: N relationship between sender and recipient Outbox Messages Querying many large items at once Inbox
Multiplayer online gaming Query filters vs. composite key indexes
Common game back-end concepts Think in terms of APIs HTTP + JSON Get friends, leaderboard Binary asset data Multiplayer servers High availability Scalability
Core (HA) game back end • • • Choose region >=2 Availability Zones Amazon EC 2 for app Elastic Load Balancing Amazon RDS database • Multi-AZ ELB Region
Scale it way out • Amazon S 3 for game data • Assets • UGC • Analytics ELB Region
Scale it way out Cloud. Front CDN • Amazon S 3 for game data • Assets • UGC • Analytics • . . . With Amazon Cloud. Front! ELB Region
Scale it way out Cloud. Front CDN • Amazon S 3 for game data • Assets • UGC • Analytics • . . . with Cloud. Front! • Auto Scaling group • Capacity on demand • Respond to users • Automatic healing ELB Region
Scale it way out • Amazon S 3 for game data • Assets • UGC • Analytics • . . . with Cloud. Front! • Auto Scaling group • Capacity on demand • Respond to users • Automatic healing • Amazon Elasti. Cache • Memcached • Redis Cloud. Front CDN ELB Region
Writing is painful • • • Cloud. Front CDN Games are write heavy Caching of limited use Key value Binary structures Database = bottleneck ELB Region
Sharding (not fun)
Amazon Dynamo. DB • • • Cloud. Front CDN Fully managed No. SQL data store Provisioned throughput Secondary indexes PUT/GET keys Document support! ELB Region
Example: Leaderboard in Dynamo. DB User. ID (hash key) Board. Name (range key) Top. Score. Date "101" "Galaxy Invaders" 5842 "2014 -09 -15 T 17: 24: 31" "101" "Meteor Blasters" 1000 "2014 -10 -22 T 23: 18: 01" "101" "Starship X" 24 "2014 -08 -31 T 13: 14: 21" "102" "Alien Adventure" 192 "2014 -07 -12 T 11: 07: 56" "102" "Galaxy Invaders" 0 "2014 -09 -18 T 07: 33: 42" "103" "Attack Ships" 3 "2014 -10 -19 T 01: 13: 24" "103" "Galaxy Invaders" 2317 "2014 -09 -11 T 06: 53: 00" "103" "Meteor Blasters" 723 "2014 -10 -19 T 01: 14: 24" "103" "Starship X" 42 "2014 -07 -11 T 06: 53: 03" • Hash key = Primary key • Range key = Sort key • Others attributes are undefined • So… How to sort based o top score?
Leaderboard with secondary indexes User. ID (hash key) Board. Name (range key) "101" "Galaxy Invaders" Board. Name (hash key) "Alien Adventure" Top. Score (range key) 5842 User. ID 192 "101" "Attack Ships" 3 "103" "Galaxy Invaders" 0 "102" "Galaxy Invaders" 2317 "103" "Galaxy Invaders" 5842 "101" "Meteor Blasters" 723 "103" "Meteor Blasters" 1000 "101" "Starship X" 24 "101" "Starship X" 42 "103" Top. Score. Date "2014 -09 -15 T 17: 24: 31" • • • Create a secondary index! Set hash key to Board. Name Set range key to Top. Score Project extra attributes as needed Can now query by Board. Name, sorted by Top. Score • Handles many common gaming use cases
Sparse indexes Scan sparse hash GSIs Game-scores-table Id (Hash) User Game Score Date 1 Bob G 1 1300 2012 -12 -23 2 3 Bob Jay G 1 1450 2012 -12 -23 1600 2012 -12 -24 4 5 6 Mary G 1 Ryan G 2 Jones G 2 2000 2012 -10 -24 123 2012 -03 -10 345 2012 -03 -20 Award-GSI Award Champ Award (Hash) Id User Score Champ 4 Mary 2000
Real-Time voting Write-heavy items
Requirements for voting Allow each person to vote only once No changing votes Real-time aggregation Voter analytics, demographics
Real-time voting architecture Raw. Votes Table Voters Voting App Aggregate. Votes Table
Scaling bottlenecks Voters 00 , 0 50 c /se 0 0 0, 0 Partition 1 1000 WCUs Provision 200, 000 WCUs ec /s 7 Partition K 1000 WCUs Partition M 1000 WCUs Candidate B Candidate A Votes Table Partition N 1000 WCUs
Write sharding Voter Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate A_2 Candidate B_3 Candidate A_6 Candidate A_8 Candidate B_4 Votes Table Candidate B_2 Candidate B_7 Candidate B_6
Write sharding Voter Update. Item: “Candidate. A_” + rand(0, 10) ADD 1 to Votes Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate A_2 Candidate B_3 Candidate A_6 Candidate A_8 Candidate B_4 Votes Table Candidate B_2 Candidate B_7 Candidate B_6
Shard aggregation Periodic Process 2. Store Voter 1. Sum Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate A_2 Candidate B_3 Candidate A_6 Candidate A_8 Votes Table Candidate B_8 Candidate B_4 Candidate A Total: 2. 5 M Candidate B_2 Candidate B_7 Candidate B_6
Shard write-heavy hash keys Trade off read cost for write scalability Consider throughput per hash key and per partition Your write workload is not horizontally scalable
Correctness in voting 1. Record vote and de-dupe; retry 2. Increment candidate counter Voter Raw. Votes Table Aggregate. Votes Table User. Id Candidate Date Segment Votes Alice A 2013 -10 -02 A_1 23 Bob B 2013 -10 -02 B_2 12 Eve B 2013 -10 -02 B_1 14 Chuck A 2013 -10 -02 A_2 25
Correctness in aggregation? Voter Raw. Votes Table Aggregate. Votes Table User. Id Candidate Date Segment Votes Alice A 2013 -10 -02 A_1 23 Bob B 2013 -10 -02 B_2 12 Eve B 2013 -10 -02 B_1 14 Chuck A 2013 -10 -02 A_2 25
Dynamo. DB Streams
Dynamo. DB Streams Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24 -hour lifetime Sub-second latency
View types Update. Item (Name = John, Destination = Pluto) View Type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John
Dynamo. DB Streams and Amazon Kinesis Client Library Partition 1 Shard 1 KCL Worker Shard 2 KCL Worker Partition 2 Dynamo. DB Client Application Updates Partition 3 Shard 3 KCL Worker Shard 4 KCL Worker Partition 4 Table Partition 5 Table Stream Amazon Kinesis Client Library Application
Cross-region replication US East (N. Virginia) Dynamo. DB Streams Asia Pacific (Sydney) Open Source Cross. Region Replication Library EU (Ireland) Replica
Dynamo. DB Streams and AWS Lambda
Triggers
Real-time voting architecture (improved) Aggregate. Votes Table Voters Voting App Raw. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR
Real-time voting architecture Handle any scale of election Voters Voting App Aggregate. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis. Enabled App Amazon EMR
Real-time voting architecture Vote only once, no changing votes Voters Voting App Aggregate. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis. Enabled app Amazon EMR
Real-time voting architecture Real-time, fault-tolerant, scalable aggregation Voters Voting app Raw. Votes Table Aggregate. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR
Real-time voting architecture Voter analytics, statistics Aggregate. Votes Table Voters Voting app Raw. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR
Analytics with Dynamo. DB Streams Collect and de-dupe data in Dynamo. DB Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics
Architecture
Reference Architecture
Elastic Event Driven Applications
Thank You WWW. AWSEDUCATE. COM
- Slides: 98