How to choose data model SQL Server or
















































- Slides: 48


How to choose data model: SQL Server or Azure Cosmos DB. Which, When and Where ?

Speaker Info Aleksandar Talev • Director of RD • Microsoft MVP Data Platform 2005 - 2018 • MCSE 2017, MCT, MCSA, MCPD, MCDBA • Semos DOO • Skopje, Macedonia Vlatko Bojkovski • Senior Engineer • MCSA, MCTS, MCSD • Semos DOO • Skopje, Macedonia

Agenda § §


What is NOSQL? § No. SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. § NOSQL actually means “Not Only” SQL § The data structures used by No. SQL databases (e. g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in No. SQL.

Advantages of No. SQL Databases • Simpler to scale, it takes the same time to insert data in an • • • empty table, or a table with billions of entries. Suitable for distributed systems. Can hold unstructured and semi-structured data. No need to map and maintain complex relationships. Fast to insert, because there’s no need to perform locks or check constraints. Fast to read, if the key is known.

Disadvantages of No. SQL Databases • Consistency needs to be considered as transactional updates across multiple entities are not guaranteed. • Relationships between data need to be maintained externally to the data. • Possible difficulties to filter and sort on non-key data; this might result in full table scans. • There is no familiar SQL

No. SQL Databases Data Structures { } Customer. ID customer: 103248, orderid: 23 dklnm, product: Dynamo, price: 3. 40, currency: USD, discount: 0. 05 Identity Column Family 1 Title: Miss, First. Name: Sam, Last. Name: Coal 2 Title: Mr, First. Name: Simon, Middle. Name: Paul, Last. Name: Tindell 3 Title: Dr, Given. Name: Shreep, Birth. Name: Sandeshreep

Comparing documents and collections with relations

Overview of Cosmos DB

Cosmos DB - Global distribution § Microsoft Azure is available in 54 regions and next will be available § Azure Cosmos DB is available in all of them § It is classified as Ring 0 Azure Service – will be available in any new region by default § Global distribution is comparable concept to what replication is for RDBMS

Read and Write regions

Consistency model § Consistency is couple of rules that defines how distributed data is available to users § Factors that are involved are throughput and latency § Azure Cosmos DB offers comprehensive 99. 99% SLAs which guarantee throughput, consistency, availability, and latency for Cosmos DB database accounts scoped to a single Azure region

Consistency models types § Strong consistency – guarantees that any item read will return the most § § recent version , doesn’t work with several regions Bounded staleness - reads may lag behind most K write operations or t time interval – Administrator define lag - threshold criteria Session – default model – guarantee reading most recent data inside any session Consistent prefix – similar to eventual consistency – good for retweets, likes and non-thread comments (A, B, C) - > (A), (A, B, C) Eventual consistency – guarantees that all of the replicas will eventually converge to reflect the most recent data

Throughput and latency § Cosmos DB Introduced normalized quantity called request unit § Number of request unit per any operation is always deterministic § 1 RU is GET document of 1024 KB § Number of request unit per second is called Throughput § Estimating throughput with RU calculator : • https: //www. documentdb. com/capacityplanner

Monitoring performance • Azure portal, Metrics blade: • Overview • Throughput • Storage • Latency • Availability • Consistency • System • Azure SDK • Less detailed metrics from the response returned by Read. Document. Collection. Async • Use the Azure Monitor SDK for detailed metrics

Security • Encryption at Rest • Primary databases are stored on SSD • Media attachments, replicas and backups are stored in Azure Blob which uses HDD • Firewall support • Policy driven IP – based access control – for inbound connections • Authentication • Master keys – primary, secondary • Resource tokens- more granular access – partition keys, documents, attachments, stored procedures

Hierarchy of a Cosmos DB Azure Cosmos DB account Database Collections Documents {} Users Stored Procedures Triggers Permissions f(n) User-Defined Functions Attachments

Containers and Partitions § Physical partitions are managed by internal resources in Cosmos DB § A logical partition is a partition within a physical partition that stores all the data associated with a single partition key value § Each document must have a partition key and a unique key, which uniquely identify it § Partition key acts as a logical partition for data and provides Azure Cosmos DB boundary for distributing data across physical partitions § Containers are logical resources that group one or more physical partitions § For : • SQL API and Mongo. DB API accounts, a container maps to a Collection. • Cassandra API and Table API accounts, a container maps to a Table. • Gremlin API accounts, a container maps to a Graph.

Partitioning collections in Cosmos DB Partition key = /User. Id er. I d: "y" Us Us Id er a" " : Container Us er. I d: "z" Logical partitions Physical partitions User. Id : "a" User. Id : "b" … User. Id : "y" User. Id : "z" …

Document structure in document databases • Documents are stored as JSON • Geo. JSON supported for geometry • Each document must be 2 MB or less • For binary blobs > 2 MB, use attachments • Pointer to a URL outside the document • Use Cosmos DB blob store (up to 2 GB total per database account) or an external store • System properties added to all documents: • _rid • _etag • _ts • _self • Id is added automatically (as a GUID) if it is not supplied

Cosmos DB – indexes • Consistent: • Cosmos DB collection follow the same consistency level as specified for the point-reads • The index is updated synchronously • Lazy: • The index is updated asynchronously that is, when the collection’s throughput capacity is not fully utilized to serve user requests. • User might get inconsistent results because data is ingested and indexed slowly. The index is generally in catch-up mode with ingested data.

Moving data into a Cosmos DB database Import Data Method Destination API Data Migration tool (dtui. exe or dt. exe) SQL Graph. SON import or the Gremlin Console Gremlin mongoimport or mongorestore Mongo AZCopy or dt. exe Table cqlsh COPY or the Spark connector Cassandra

Programming Cosmos DB

Using SQL API JSON documents {} {} {} } REST API { Client access SQL (like) SDKs API Calls Java. Script logic

Accessing data Connection mode Supported protocol Supported SDKs API/Service port Gateway HTTPS All SDKS SQL(443), Mongo(10250, 10255, 10256), Table(443), Cassandra(10350), Graph(443) Direct HTTPS . NET and Java SDK Ports within 10, 000 -20, 000 range Direct TCP . NET SDK Ports within 10, 000 -20, 000 range

Using SQL to find documents in a collection • • • SELECT clause FROM clause (optional) JOIN clause (optional) WHERE clause (optional) ORDER BY clause (optional)

Using joins • • • Use the JOIN clause Used for joining inside documents Joins between documents are not possible CROSS JOIN only (no ON clause) Useful for projecting/flattening arrays of subdocuments SELECT p. Name, p. Current. Address. City AS Current. City, h. City AS Earlier. City FROM p JOIN h IN p. Address. History

Retrieving data using SQL queries • Connect with Document. Client • Issue SQL queries with Create. Document. Query • Use LINQ by casting query to Ienumerable • Use Read. Document. Async to retrieve a document by Id • Returns a Document. Response (typed) or a Resource. Response<Document> (untyped)

How Cosmos DB supports server-side operations • Procedural logic: Java. Script as a high-level programming language • Atomic transactions: Azure Cosmos DB guarantees that the database operations are atomic • Performance: The JSON data is intrinsically mapped to the Java. Script language type system. Other performance benefits : • Batching: . • Pre-compilation: • Sequencing: . • Encapsulation: Stored procedures can be used to group logic in one place

Stored procedures, triggers, and User Defined Functions (UDF)

Understanding the Cosmos DB change feed • • Stream of JSON documents that record inserts and updates to a collection, ordered by time. Query through the SQL API; uses collection throughput e. Commerce Scenario Cosmos DB Orders Change feed Microservices Tax calculation Order processing

Why Azure Cosmos DB is the recommended database for serverless computing

Using the change feed with Azure Functions App Service—“serverless” functions Possible to trigger from a Cosmos DB change feed Function is fired on each change, and receives the changed document(s) Cosmos DB change feed App Service Cosmos DB trigger Azure Function Output

Patterns for working with Cosmos DB § Vanilla (1: 1 and 1: N) § M: N § Time-series data § Write heavy (Event sourcing) § Possible hot spots

Vanilla 1: 1 Gaming use case with 100 s to billions of active players § Get. Player. By. Id § Add. Player § Remove. Player § Update. Player § Strawman: partition key = id § Only GET, POST, PUT, and DELETE § Provisioned throughput = Sigma (RUi * Ni) § Bonus: Bulk Inserts § Bonus: Bulk Read (use read feed or change feed) for analytics

Vanilla 1: N § Supporting lookup of game state § Get. Game. By. Ids(Player. Id, Game. Id) § Get. Games. By. Player. Id(Player. Id) § Add. Game § Remove. Game § Partition key is player. Id § Get. Game. By. Ids, Add. Game, and Remove. Game are GET, POST, and DELETE § Get. Games. By. Player. Id is a single-partition query: SELECT * FROM c WHERE c. player. Id = ‘p 1’

M: N § Multi-player gaming. Lookup by either game. Id or player. Id § Get. Player. By. Id(Player. Id) § Get. Game. By. Id(Game. Id) § Add. Game § Remove. Game § Partition key = Player. Id is OK if mix is skewed towards Player calls (because of index on game ID) § If mix is 50: 50, then need to store two pivots of the same data by Player Id and Game Id § Double-writes vs. change feed for keeping copies up-to-date

Time-series data § Ingest readings from sensors. Perform lookups by date time range § Add. Sensor. Reading(Sensor. Id) § Get. Readings. For. Time. Range(Start. Time, End. Time) § Get. Sensor. Readings. For. Time. Range(Sensor. Id, Start. Time, End. Time) § No natural partition key. Time is an anti-pattern! § Set partition key to Sensor ID, id to timestamp, set TTL on the time window § Bonus: create collections for per-minute, per-hour, per-day windows based on stream aggregation on time-windows

Event sourcing pattern § Write-heavy workloads. Store each event as an immutable document, instead of updating state in-place § Add. Event. For. Object(Object. Id, Event. Type, Timestamp) § Get. Events. For. Object(Object. Id, Event. Type) § Get. Events. Since. Timestamp(Timestamp) § Why event-driven architectures? § Inserts are more efficient than update at scale § Built-in audit log of events § Decoupled micro-services that act on events § Get. Events. For. Object is a single-partition query to get latest state on read

Hot spot - large partition keys § Use cases § Multi-tenant applications where few tenants are very large § Router publishes telemetry at higher rate than sensors § Celebrity in a social networking app, viral gaming tournament § Patterns to manage large partition keys § Have a surrogate partition key like tenant ID + 0 -100 § Use hybrid partitioning scheme for small tenants, and large tenants = 0 -100 § Move large tenants to their own collections § If the per-document size is large, use the patterns for large documents § Get. Events. Since. Timestamp using change feed § Bonus: create collections per hour (0 -23), and set differential throughput based on the request rates


Comparing Cosmos DB with SQL Server (1)

Comparing Cosmos DB with SQL Server (2)

Summary • Cosmos DB is not a replacement for SQL Server • You would very, very rarely, if ever, migrate your data from an existing SQL Server database to Cosmos DB • Most common scenarios will be : • International retail chains and order processing • Multi Player Gaming • Io. T • Social media apps (Comments, likes and retweets) • Fraud/Anomaly detection • Metering Counting and regulating usage (API calls, transactions/second, minutes used)

