Wheres my lookup table Modeling relational data in

Where’s my lookup table? Modeling relational data in a denormalized world. Rick Houlihan Principal Technologist, No. SQL - AWS

What are we talking about? • A Brief History of Data Processing – Why No. SQL? • The Great Migration – Amazon’s path to database freedom • Making it Work - Normalized versus denormalized data modeling • Document, Wide Column, Key-Value – Why it’s all the same • Why Cloud Native No. SQL is the best choice

History of Data Processing “History repeats itself because nobody was listening the first time”

Timeline of Database Technology

Technology Adoption and the Hype Curve

Amazon’s Great Migration

20 years of code • • 3, 000 Oracle Instances Over 12, 000 services ~25, 000 developers 54. 5 MM TPS Peak (Prime Day 2019)

Why make the move? • RDBMS was breaking at scale • Required rollups and denormalized data to avoid complex queries • Impact of bad deployments • One bad stored procedure would break a dozen services • No. SQL databases are “flexible” • Schemaless design allows the app to evolve over time • Cost of RDBMS was too high • CPU is expensive, storage is cheap

Sizing the Workloads Problems with limited scope are easier to solve “I need a system to track employee vacation time. ” Unbounded problems are harder to solve “I need a root cause analysis engine to correlate transaction level events to buying patterns across global markets. ”

Sizing the Database

Scaling Relational DBs

Sharded Relational DBs? ? A B C D

No. SQL Databases • Denormalize and shard to provide horizontal scale • Near unbounded throughput and storage Collection 1 1 TB Shard A Shard B 500 GB

Partition Keys in No. SQL • • • Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale Id = 1 Name = Jim Hash (1) = 7 B 00 Id = 3 Name = Kim Dept = Ops Id = 2 Name = Andy Dept = Eng Hash (2) = 48 54 55 Key Space Hash (3) = CD A 9 AA FF

Types of Database Workloads Operations • Online Transaction Processing (OLTP) – • Online Analytics Processing (OLAP) – Analytics • Most common type of app BI and ad-hoc data projections Decision Support Systems (DSS) – Long running query aggregations and projections

Why No. SQL? SQL No. SQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP or DSS at scale

No. SQL Data Modeling “A ship in port is safe, but that’s not what ships were built for. ”

It’s all about relationships… Social network Document management IT monitoring Process control Data trees

The No. SQL Model Table/Collection Items Attributes Partition Sort Key Mandatory Key-value access pattern Determines data distribution Optional Model 1: N and N: N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values

SQL vs. No. SQL design pattern

Ad hoc “Joins” in SQL

Modeled “Joins” in No. SQL

Document vs. Wide Column Data Modeling

Document vs. Wide Column PK (_id) jsmith@abc. com first. Name John last. Name Smith address 123 A Street city Seattle state WA GSIPK SEA 58 GSISK 07. 650. O 1

Indexing Efficiently in No. SQL Document Wide Column Default index on _id Partition Key defines default index Query planner selects the index User specifies the index Include Shard Key or suffer Partition Key value always required Optimize with Compound Indexes Use Projections to “pre-load” index

No. SQL Data Modeling - KEY CONCEPTS • Selecting a Partition/Shard Key • Large number of distinct values • Items are uniformly requested and randomly distributed • Examples: • Bad: Status, Gender • Good: Customer. Id, Device. Id © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Creating Compound Indexes • Model 1: n and n: n relationships • Efficient/selective patterns • Query multiple entities • Leverage range queries • Examples: • Orders. Order. Item • Hierarchical relationships

TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Nature of the application • OLTP / OLAP / DSS • Define the Entity-Relationship Model • Identify Data Life Cycle • TTL, Backup/Archival, etc.

TENETS OF No. SQL DATA MODELING • Understand the use case • Define the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Identify data sources • Define queries and write patterns • Document all workflows

TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1 application service = 1 table • Reduce round trips • Simplify access patterns • Identify Primary Keys • How will items be inserted and read? • Overload items into partitions • Define indexes for secondary access patterns

TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat-> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Complex Queries “Computers are useless. They can only give you answers. ”

Dynamo. DB Streams and AWS Lambda

Triggers Item/table level metrics Amazon Cloud. Search Kinesis Firehose Lambda function Notify change

Advanced Data Modeling “It turns out it's important to build a product and not just a bunch of data models. ” - Hilary Mason

Managing Relational Transactions

Dynamo. DB Transactions API • Transact. Write. Items • • • Synchronous update, put, delete, and check • Atomic • Automated Rollbacks Up to 10 items within a transaction Supports multiple tables Complex conditional checks Good Use Cases • • Commit changes across items Conditional batch inserts/updates • Bad Use Case • Maintaining normalized data

Dynamo. DB Table Schema

Reverse Lookup GSI

Complex Relational Data “Dude, where’s my lookup table? ” - Anonymous Amazon SDE

Modeling Complex Relationships

The Table Schema

The Index Schema (GSI 1)

The Index Schema (GSI 2)

The Index Schema (GSI 3)

The Final Result

Optimize for Common Patterns “If you optimize everything, you will always be unhappy. ” - Donald Knuth

Access Patterns Matter

Optimized for Writes

‘Born in the Cloud’ No. SQL “Cloud computing is the third wave of the digital revolution. ” - Lowell Mc. Adam

Amazon Dynamo. DB Fully Managed No. SQL Wide Column/Document Scales to Any Workload Fast and Consistent Access Control Event Driven Programming

Why Cloud Native is the only choice • “Infinite” scalability • Add capacity in minutes, not weeks/months • Pay for what you actually use • True auto-scaling, JIT provisioning • Incredible performance • Massively distributed backplane • Fully managed • Invest in what matters to your business

Auto Scaling • Throughput automatically adapts to your actual traffic Without Auto Scaling With Auto Scaling

Conclusions

Thank You!