Wheres my lookup table Modeling relational data in
Where’s my lookup table? Modeling relational data in a denormalized world. Rick Houlihan Principal Technologist, No. SQL - AWS
What are we talking about? • A Brief History of Data Processing – Why No. SQL? • The Great Migration – Amazon’s path to database freedom • Making it Work - Normalized versus denormalized data modeling • Document, Wide Column, Key-Value – Why it’s all the same • Why Cloud Native No. SQL is the best choice
History of Data Processing “History repeats itself because nobody was listening the first time”
Timeline of Database Technology
Technology Adoption and the Hype Curve
Amazon’s Great Migration
20 years of code • • 3, 000 Oracle Instances Over 12, 000 services ~25, 000 developers 54. 5 MM TPS Peak (Prime Day 2019)
Why make the move? • RDBMS was breaking at scale • Required rollups and denormalized data to avoid complex queries • Impact of bad deployments • One bad stored procedure would break a dozen services • No. SQL databases are “flexible” • Schemaless design allows the app to evolve over time • Cost of RDBMS was too high • CPU is expensive, storage is cheap
Sizing the Workloads Problems with limited scope are easier to solve “I need a system to track employee vacation time. ” Unbounded problems are harder to solve “I need a root cause analysis engine to correlate transaction level events to buying patterns across global markets. ”
Sizing the Database
Scaling Relational DBs
Sharded Relational DBs? ? A B C D
No. SQL Databases • Denormalize and shard to provide horizontal scale • Near unbounded throughput and storage Collection 1 1 TB Shard A Shard B 500 GB
Partition Keys in No. SQL • • • Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale Id = 1 Name = Jim Hash (1) = 7 B 00 Id = 3 Name = Kim Dept = Ops Id = 2 Name = Andy Dept = Eng Hash (2) = 48 54 55 Key Space Hash (3) = CD A 9 AA FF
Types of Database Workloads Operations • Online Transaction Processing (OLTP) – • Online Analytics Processing (OLAP) – Analytics • Most common type of app BI and ad-hoc data projections Decision Support Systems (DSS) – Long running query aggregations and projections
Why No. SQL? SQL No. SQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP or DSS at scale
No. SQL Data Modeling “A ship in port is safe, but that’s not what ships were built for. ”
It’s all about relationships… Social network Document management IT monitoring Process control Data trees
The No. SQL Model Table/Collection Items Attributes Partition Sort Key Mandatory Key-value access pattern Determines data distribution Optional Model 1: N and N: N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values
SQL vs. No. SQL design pattern
Ad hoc “Joins” in SQL
Modeled “Joins” in No. SQL
Modeled “Joins” in No. SQL
Document vs. Wide Column Data Modeling
Document vs. Wide Column PK (_id) jsmith@abc. com first. Name John last. Name Smith address 123 A Street city Seattle state WA GSIPK SEA 58 GSISK 07. 650. O 1
Indexing Efficiently in No. SQL Document Wide Column Default index on _id Partition Key defines default index Query planner selects the index User specifies the index Include Shard Key or suffer Partition Key value always required Optimize with Compound Indexes Use Projections to “pre-load” index
No. SQL Data Modeling - KEY CONCEPTS • Selecting a Partition/Shard Key • Large number of distinct values • Items are uniformly requested and randomly distributed • Examples: • Bad: Status, Gender • Good: Customer. Id, Device. Id © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Creating Compound Indexes • Model 1: n and n: n relationships • Efficient/selective patterns • Query multiple entities • Leverage range queries • Examples: • Orders. Order. Item • Hierarchical relationships
TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Nature of the application • OLTP / OLAP / DSS • Define the Entity-Relationship Model • Identify Data Life Cycle • TTL, Backup/Archival, etc.
TENETS OF No. SQL DATA MODELING • Understand the use case • Define the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Identify data sources • Define queries and write patterns • Document all workflows
TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1 application service = 1 table • Reduce round trips • Simplify access patterns • Identify Primary Keys • How will items be inserted and read? • Overload items into partitions • Define indexes for secondary access patterns
TENETS OF No. SQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat-> Review © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Complex Queries “Computers are useless. They can only give you answers. ”
Dynamo. DB Streams and AWS Lambda
Triggers Item/table level metrics Amazon Cloud. Search Kinesis Firehose Lambda function Notify change
Advanced Data Modeling “It turns out it's important to build a product and not just a bunch of data models. ” - Hilary Mason
Managing Relational Transactions
Dynamo. DB Transactions API • Transact. Write. Items • • • Synchronous update, put, delete, and check • Atomic • Automated Rollbacks Up to 10 items within a transaction Supports multiple tables Complex conditional checks Good Use Cases • • Commit changes across items Conditional batch inserts/updates • Bad Use Case • Maintaining normalized data
Dynamo. DB Table Schema
Reverse Lookup GSI
Complex Relational Data “Dude, where’s my lookup table? ” - Anonymous Amazon SDE
Modeling Complex Relationships
The Table Schema
The Index Schema (GSI 1)
The Index Schema (GSI 2)
The Index Schema (GSI 3)
The Final Result
Optimize for Common Patterns “If you optimize everything, you will always be unhappy. ” - Donald Knuth
Access Patterns Matter
Optimized for Writes
‘Born in the Cloud’ No. SQL “Cloud computing is the third wave of the digital revolution. ” - Lowell Mc. Adam
Amazon Dynamo. DB Fully Managed No. SQL Wide Column/Document Scales to Any Workload Fast and Consistent Access Control Event Driven Programming
Why Cloud Native is the only choice • “Infinite” scalability • Add capacity in minutes, not weeks/months • Pay for what you actually use • True auto-scaling, JIT provisioning • Incredible performance • Massively distributed backplane • Fully managed • Invest in what matters to your business
Auto Scaling • Throughput automatically adapts to your actual traffic Without Auto Scaling With Auto Scaling
Conclusions
Thank You!
- Slides: 56