No SQL and Dynamo DB Rick Houlihan Principal

No. SQL and Dynamo. DB Rick Houlihan Principal Solutions Architect Amazon Web Services October 2015 WWW. AWSEDUCATE. COM

What to expect from the session • Brief history of data processing • Introduction to No. SQL • Dynamo. DB Internals • Tables, API, data types, indexes • Scaling and data modeling • Design patterns and best practices • Event driven applications and DDB Streams • Reference architecture

What is a Database? “A structured set of data held in a computer, especially one that is accessible in various ways. ” - Google “A database is an organized mechanism for storing, managing and retrieving information. ” – About. com “A place to put stuff my app needs. ” – Average Developer

Fi le S s Q L N o. S BM R D te m s ru m D Sy s a D at or ds t. R ec U ni rs Le dg e Data Pressure Timeline of Database Technology

Data Volume Since 2010 • 90% of stored data generated in last 2 years • 1 Terabyte of data in 2010 equals 6. 5 Petabytes today • Linear correlation between data pressure and technical innovation • No reason these trends will not continue over time

Why No. SQL? SQL No. SQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale

The Iron Triangle of Data – All About CAP Availability: All clients can always read and write Data Models: Relational Column Oriented Document Key/Value A CA MSSQL Oracle DB 2 Postgres My. SQL AP Aster Data Greenplum Vertica Voldemort Cassandra Tokyo Cabinet Simple. DB KAI Couch. DB Riak Pick Two C Consistency: All clients always have the same view of data P Partition Tolerance: CP Big Table Hypertable Hbase Mongo. DB Terastore Couchbase Scalaris Dynamo. DB Berkeley. DB Memcache Redis The system works well despite physical network partitions

Partition Management for AP Systems

SQL vs. No. SQL Access Pattern

Technology Adoption and the Hype Curve

Amazon Dynamo. DB Fully Managed No. SQL Fast and Consistent Document or Key-Value Access Control Scales to Any Workload Event Driven Programming

Tables, API, Data Types

Table and item API Admin CRUD Create Table Put/Get Item Update Table Batch Put/Get Item Delete Table Update Item Describe Table Delete Item Query Scan Dynamo. DB Streams API List. Streams Describe. Stream Get. Shard. Iterator Get. Records

Data types String (S) Number (N) Binary (B) String Set (SS) Number Set (NS) Binary Set (BS) Boolean (BOOL) Null (NULL) List (L) Map (M) Used for storing nested JSON documents

Table Items Attributes Mandatory Key-value access pattern Determines data distribution Hash Range Key Optional Model 1: N relationships Enables rich query capabilities All items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values paged responses

Hash table Hash key uniquely identifies an item Hash key is used for building an unordered hash index Table can be partitioned for scale 0000 Id = 1 Name = Jim Id = 2 Name = Andy Dept = Eng Id = 3 Name = Kim Dept = Ops Hash (1) = 7 B Hash (2) = 48 Hash (3) = CD 54 55 Key Space A 9 AA FF

Hash-range table Hash key and range key together uniquely identify an Item Within unordered hash index, data is sorted by the range key No limit on the number of items (∞) per hash key • Except if you have local secondary indexes Partition 2 Partition 1 00: 0 54: ∞ Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Hash (2) = 48 Partition 3 A 9: ∞ 55 Customer# = 1 Order# = 10 Item = Toy Customer# = 1 Order# = 11 Item = Boots Hash (1) = 7 B AA FF: ∞ Customer# = 3 Order# = 10 Item = Book Customer# = 3 Order# = 11 Item = Paper Hash (3) = CD

Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 1 Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 2 Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 3 Partition 1 Partition 2 Partition N

Indexes

Local secondary index (LSI) Alternate range key attribute Index is local to a hash key (or partition) Table A 1 A 3 A 2 (hash) (range) (table key) LSIs 10 GB max per hash key, i. e. LSIs limit the # of range keys! A 1 A 2 A 3 A 4 A 5 (hash) (range) KEYS_ONLY A 1 A 4 A 2 A 3 INCLUDE A 3 (hash) (range) (table key) (projected) A 1 A 5 A 2 A 3 A 4 ALL (hash) (range) (table key) (projected)

Global secondary index (GSI) Online indexing Alternate hash (+range) key Index is across all table hash keys (partitions) Table A 2 A 1 (hash) (table key) GSIs RCUs/WCUs provisioned separately for GSIs A 1 A 2 A 3 A 4 A 5 (hash) KEYS_ONLY A 5 A 4 A 1 A 3 (hash) (range) (table key) (projected) INCLUDE A 3 A 4 A 5 A 1 A 2 A 3 (hash) (range) (table key) (projected) ALL

How do GSI updates work? Client 1. Update request 2. Updat e respon Table se Primary table Global Primary table Secondary table 2. Asynchronous update (in progress) Index If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI? LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!

Scaling

Scaling Throughput • Provision any amount of throughput to a table Size • Add any number of items to a table • Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit Scaling is achieved through partitioning

Throughput Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent RCU WCU

What causes throttling? If sustained throughput goes beyond provisioned throughput per partition Non-uniform workloads • Hot keys/hot partitions • Very large bursts Mixing hot data with cold data • Use a table per time period Overly partitioned tables • If sustained throughput > partition limit, Dynamo. DB may throttle requests • Solution: Increase provisioned throughput or restructure data

Partition What bad No. SQL looks like… Heat Time

Getting the most out of Dynamo. DB throughput “To get the most out of Dynamo. DB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible. ” —Dynamo. DB Developer Guide Space: access is evenly spread over the key-space Time: requests arrive evenly spaced in time

Much better picture…

A global leader in retargeting More than 10, 000 active advertisers in >100 countries • • • Provisioned for over 1 M transactions per second 4 regions in use with live traffic replication 120 B+ key fetches worldwide per day (RTB) 1. 5 TB of data stored per region 30 B+ items stored in reach region <3 ms uniform query latency, <10 ms 99. 95% Dynamo. DB

• Simple video monitoring & security • Fast growth – “suddenly petabytes” • More inbound video than You. Tube Switch to Dynamo. DB Move to AWS cameras 2009 2010 2011 2012 2013

Dynamo. DB reduces delivery time for video events from 5 -10 secs to 50 milliseconds

Online Gaming “Dynamo. DB came along at just the right time, and Halfbrick switched to storing our game data in Dynamo. DB, which alleviated our scaling problems while also freeing us from the burden of managing all the underlying hardware and software. We love that Dynamo. DB handles so much of the management for us, freeing us to focus on development. ”

Data modeling

1: 1 relationships or key-values Use a table or GSI with a hash key Use Get. Item or Batch. Get. Item API Example: Given an SSN or license number, get attributes Users Table Hash key SSN = 123 -45 -6789 SSN = 987 -65 -4321 Attributes Email = johndoe@nowhere. com, License = TDL 25478134 Email = maryfowler@somewhere. com, License = TDL 78309234 Users-Email-GSI Hash key Attributes License = TDL 78309234 Email = maryfowler@somewhere. com, SSN = 987 -65 -4321 License = TDL 25478134 Email = johndoe@nowhere. com, SSN = 123 -45 -6789

1: N relationships or parent-children Use a table or GSI with hash and range key Use Query API Example: • Given a device, find all readings between epoch X, Y Device-measurements Hash Key Range key Attributes Device. Id = 1 epoch = 5513 A 97 C Temperature = 30, pressure = 90 Device. Id = 1 epoch = 5513 A 9 DB Temperature = 30, pressure = 90

N: M relationships Use a table and GSI with hash and range key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key User. Id = bob Game. Id = Game 1 User. Id = fred Game. Id = Game 2 User. Id = bob Game. Id = Game 3 Game-Users-GSI Hash Key Range key Game. Id = Game 1 User. Id = bob Game. Id = Game 2 User. Id = fred Game. Id = Game 3 User. Id = bob

Documents (JSON) Java. Script Dynamo. DB string S New data types (M, L, BOOL, NULL) introduced to support JSON Document SDKs number N boolean BOOL null NULL array L object M • Simple programming model • Conversion to/from JSON • Java, Java. Script, Ruby, . NET Cannot index (S, N) elements of a JSON object stored in M • Only top-level table attributes can be used in LSIs and GSIs without Streams/Lambda

Rich expressions Projection expression • Query/Get/Scan: Product. Reviews. Five. Star[0] Filter expression • Query/Scan: #VIEWS > : num Conditional expression • Put/Update/Delete. Item: attribute_not_exists (#pr. Five. Star) Update expression • Update. Item: set Replies = Replies + : num

Scenarios and best practices

Event logging Storing time series data

Older tables RCUs = 10000 WCUs = 10000 Events_table_2015_March Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) RCUs = 1000 WCUs = 100 Events_table_2015_Feburary Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) RCUs = 100 WCUs = 1 Events_table_2015_January Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) RCUs = 10 WCUs = 1 Don’t mix hot and cold data; archive cold data to Amazon S 3 Cold data Current table Events_table_2015_April Event_id Timestamp Attribute 1 …. Attribute N (Hash key) (range key) Hot data Time series tables

Use a table per time period Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables Dealing with time series data

Product catalog Popular items (read)

Scaling bottlenecks SELECT Id, Description, . . . FROM Product. Catalog WHERE Id="POPULAR_PRODUCT" Shoppers c /se 0 0 0, 0 7 Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Product B Product A Product. Catalog Table Partition 50 2000 RCU

Requests Per Second Request Distribution Per Hash Key Item Primary Key Dynamo. DB Requests

Cache popular items SELECT Id, Description, . . . FROM Product. Catalog WHERE Id="POPULAR_PRODUCT" User Dynamo. DB Partition 1 Partition 2 Product. Catalog Table

Requests Per Second Request Distribution Per Hash Key Item Primary Key Dynamo. DB Requests Cache Hits

Messaging app Large items Filters vs. indexes M: N Modeling—inbox and outbox

David Messages App Inbox SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Outbox Messages Table SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC

Large and small attributes mixed Inbox David Hash key Range key Messages Table Recipient Date Sender Message David 2014 -10 -02 Bob … … 48 more messages for David … David 2014 -10 -03 Alice … Alice 2014 -09 -28 Bob … Alice 2014 -10 -01 Carol … (Many more messages) SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC 50 items × 256 KB each Large message bodies Attachments

Computing inbox query cost 50 * 256 KB * (1 RCU / 4 KB) Average item size Items evaluated by query * (1 / 2) = 1600 RCU Eventually consistent reads Conversion ratio

Uniformly distributes large item reads Separate the bulk data (50 sequential items at 128 bytes) David 1. Query Inbox-GSI: 1 RCU 2. Batch. Get. Item Messages: 1600 RCU (50 separate items at 256 KB) Inbox-GSI Messages Table Recipient Date Sender Subject Msg. Id Body David 2014 -10 -02 Bob Hi!… afed 9 d 2 b … David 2014 -10 -03 Alice RE: The… 3 kf 8 … Alice 2014 -09 -28 Bob FW: Ok… 9 d 2 b ct 7 r … Alice 2014 -10 -01 Carol Hi!. . . ct 7 r afed …

Inbox GSI Define which attributes to copy into the index

Outbox GSI Outbox Sender SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC

Messaging app David Inbox Outbox Inbox Global secondary index Outbox Global secondary index Messages Table

Distribute large items Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M: N relationship between sender and recipient Outbox Messages Querying many large items at once Inbox

Multiplayer online gaming Query filters vs. composite key indexes

Common game back-end concepts Think in terms of APIs HTTP + JSON Get friends, leaderboard Binary asset data Multiplayer servers High availability Scalability

Core (HA) game back end • • • Choose region >=2 Availability Zones Amazon EC 2 for app Elastic Load Balancing Amazon RDS database • Multi-AZ ELB Region

Scale it way out • Amazon S 3 for game data • Assets • UGC • Analytics ELB Region

Scale it way out Cloud. Front CDN • Amazon S 3 for game data • Assets • UGC • Analytics • . . . With Amazon Cloud. Front! ELB Region

Scale it way out Cloud. Front CDN • Amazon S 3 for game data • Assets • UGC • Analytics • . . . with Cloud. Front! • Auto Scaling group • Capacity on demand • Respond to users • Automatic healing ELB Region

Scale it way out • Amazon S 3 for game data • Assets • UGC • Analytics • . . . with Cloud. Front! • Auto Scaling group • Capacity on demand • Respond to users • Automatic healing • Amazon Elasti. Cache • Memcached • Redis Cloud. Front CDN ELB Region

Writing is painful • • • Cloud. Front CDN Games are write heavy Caching of limited use Key value Binary structures Database = bottleneck ELB Region

Sharding (not fun)

Amazon Dynamo. DB • • • Cloud. Front CDN Fully managed No. SQL data store Provisioned throughput Secondary indexes PUT/GET keys Document support! ELB Region

Example: Leaderboard in Dynamo. DB User. ID (hash key) Board. Name (range key) Top. Score. Date "101" "Galaxy Invaders" 5842 "2014 -09 -15 T 17: 24: 31" "101" "Meteor Blasters" 1000 "2014 -10 -22 T 23: 18: 01" "101" "Starship X" 24 "2014 -08 -31 T 13: 14: 21" "102" "Alien Adventure" 192 "2014 -07 -12 T 11: 07: 56" "102" "Galaxy Invaders" 0 "2014 -09 -18 T 07: 33: 42" "103" "Attack Ships" 3 "2014 -10 -19 T 01: 13: 24" "103" "Galaxy Invaders" 2317 "2014 -09 -11 T 06: 53: 00" "103" "Meteor Blasters" 723 "2014 -10 -19 T 01: 14: 24" "103" "Starship X" 42 "2014 -07 -11 T 06: 53: 03" • Hash key = Primary key • Range key = Sort key • Others attributes are undefined • So… How to sort based o top score?

Leaderboard with secondary indexes User. ID (hash key) Board. Name (range key) "101" "Galaxy Invaders" Board. Name (hash key) "Alien Adventure" Top. Score (range key) 5842 User. ID 192 "101" "Attack Ships" 3 "103" "Galaxy Invaders" 0 "102" "Galaxy Invaders" 2317 "103" "Galaxy Invaders" 5842 "101" "Meteor Blasters" 723 "103" "Meteor Blasters" 1000 "101" "Starship X" 24 "101" "Starship X" 42 "103" Top. Score. Date "2014 -09 -15 T 17: 24: 31" • • • Create a secondary index! Set hash key to Board. Name Set range key to Top. Score Project extra attributes as needed Can now query by Board. Name, sorted by Top. Score • Handles many common gaming use cases

Sparse indexes Scan sparse hash GSIs Game-scores-table Id (Hash) User Game Score Date 1 Bob G 1 1300 2012 -12 -23 2 3 Bob Jay G 1 1450 2012 -12 -23 1600 2012 -12 -24 4 5 6 Mary G 1 Ryan G 2 Jones G 2 2000 2012 -10 -24 123 2012 -03 -10 345 2012 -03 -20 Award-GSI Award Champ Award (Hash) Id User Score Champ 4 Mary 2000

Real-Time voting Write-heavy items

Requirements for voting Allow each person to vote only once No changing votes Real-time aggregation Voter analytics, demographics

Real-time voting architecture Raw. Votes Table Voters Voting App Aggregate. Votes Table

Scaling bottlenecks Voters 00 , 0 50 c /se 0 0 0, 0 Partition 1 1000 WCUs Provision 200, 000 WCUs ec /s 7 Partition K 1000 WCUs Partition M 1000 WCUs Candidate B Candidate A Votes Table Partition N 1000 WCUs

Write sharding Voter Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate A_2 Candidate B_3 Candidate A_6 Candidate A_8 Candidate B_4 Votes Table Candidate B_2 Candidate B_7 Candidate B_6

Write sharding Voter Update. Item: “Candidate. A_” + rand(0, 10) ADD 1 to Votes Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate A_2 Candidate B_3 Candidate A_6 Candidate A_8 Candidate B_4 Votes Table Candidate B_2 Candidate B_7 Candidate B_6

Shard aggregation Periodic Process 2. Store Voter 1. Sum Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate A_2 Candidate B_3 Candidate A_6 Candidate A_8 Votes Table Candidate B_8 Candidate B_4 Candidate A Total: 2. 5 M Candidate B_2 Candidate B_7 Candidate B_6

Shard write-heavy hash keys Trade off read cost for write scalability Consider throughput per hash key and per partition Your write workload is not horizontally scalable

Correctness in voting 1. Record vote and de-dupe; retry 2. Increment candidate counter Voter Raw. Votes Table Aggregate. Votes Table User. Id Candidate Date Segment Votes Alice A 2013 -10 -02 A_1 23 Bob B 2013 -10 -02 B_2 12 Eve B 2013 -10 -02 B_1 14 Chuck A 2013 -10 -02 A_2 25

Correctness in aggregation? Voter Raw. Votes Table Aggregate. Votes Table User. Id Candidate Date Segment Votes Alice A 2013 -10 -02 A_1 23 Bob B 2013 -10 -02 B_2 12 Eve B 2013 -10 -02 B_1 14 Chuck A 2013 -10 -02 A_2 25

Dynamo. DB Streams

Dynamo. DB Streams Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24 -hour lifetime Sub-second latency

View types Update. Item (Name = John, Destination = Pluto) View Type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John

Dynamo. DB Streams and Amazon Kinesis Client Library Partition 1 Shard 1 KCL Worker Shard 2 KCL Worker Partition 2 Dynamo. DB Client Application Updates Partition 3 Shard 3 KCL Worker Shard 4 KCL Worker Partition 4 Table Partition 5 Table Stream Amazon Kinesis Client Library Application

Cross-region replication US East (N. Virginia) Dynamo. DB Streams Asia Pacific (Sydney) Open Source Cross. Region Replication Library EU (Ireland) Replica

Dynamo. DB Streams and AWS Lambda

Triggers

Real-time voting architecture (improved) Aggregate. Votes Table Voters Voting App Raw. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR

Real-time voting architecture Handle any scale of election Voters Voting App Aggregate. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis. Enabled App Amazon EMR

Real-time voting architecture Vote only once, no changing votes Voters Voting App Aggregate. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis. Enabled app Amazon EMR

Real-time voting architecture Real-time, fault-tolerant, scalable aggregation Voters Voting app Raw. Votes Table Aggregate. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR

Real-time voting architecture Voter analytics, statistics Aggregate. Votes Table Voters Voting app Raw. Votes Table Raw. Votes Dynamo. DB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR

Analytics with Dynamo. DB Streams Collect and de-dupe data in Dynamo. DB Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics

Architecture

Reference Architecture

Elastic Event Driven Applications

Thank You WWW. AWSEDUCATE. COM