SQLintersection Graph Database Processing with Azure SQL DB
SQLintersection Graph Database Processing with Azure SQL DB and SQL Server 2017 Joe Sack, Principal Program Manager, Microsoft Joe. Sack@Microsoft. com
Session Objectives § Understand graph scenarios and when they may be useful § Learn about what’s being offered in SQL Server 2017 and Azure SQL Database § Learn enough to be able to get quickly started on your own § Questions, feedback or issues we don’t get to? o Email me at joe. sack@Microsoft. com © SQLintersection. All rights reserved. http: //www. SQLintersection. com
Adaptability Adapt based on customer workload characteristics SQL Server 2017 Themes Choice Mindshare Provide customers with a choice Leverage the strength of strong technical communities
Ingredient Recall Scenario
Ingredient Recall Scenario
Ingredient Recall Scenario
Ingredient Recall Scenario
Ingredient Recall Scenario
Understanding the impact
Relational vs. Graph § Graph and relational designs can answer the same questions § But if traversal of relationships define the primary application requirements, Graph can solve this more intuitively and with less code © SQLintersection. All rights reserved. http: //www. SQLintersection. com
Define: Graph § Nodes: Entities – for example, stores, people, products, businesses § Edges: Relationships between nodes, lines that connect nodes to other nodes § Properties or Attributes: information associated with specific nodes and edges § Graph: a collection of nodes and edges (or entities and relationships)
What is a graph database? § Edges or relationships are first class entities in a Graph Database and can have attributes or properties associated with them § A single edge can flexibly connect multiple nodes in a Graph Database § You can express pattern matching and multi-hop navigation queries easily § Supports OLTP and OLAP (analytics) just like SQL databases
Why Graph Databases? A Manages Location Manages O works_for Manages B Manages Leads E C Manages D Leads Manages Leads F Location Lead s collaborates Hierarchical or interconnected data, entities with multiple parents. A Manages Lead s B Leads Manages C D collaborates Complex many-to-many relationships. Organically grow connections as the business evolves. Analyze interconnected data, materialize new information from existing facts. Identify nonobvious connections
Graph Scenarios § Recommendation engines § Social networks § Networks and IT infrastructure topologies § Fraud detection § Product catalog with sales and marketing data § IOT device telemetry
SQL Server 2017 Support for graph data § Native nodes and edge table support § Query language extension provides multihop navigation using join -free pattern matching § Query across regular SQL tables and graph data § Interoperability – for example, support for Columnstore indexes
Why Extend SQL Server to support Graphs? • Trusted by many customers for enterprise and mission critical applications • Mature product, supports many advanced technologies like high availability, disaster recovery etc. Also comes with cutting edge technologies like Columnstore, advanced analytics, ML etc. • One platform with no need to extract, transform and load data into another system to analyze relationships
Recommendation Engine Scenario examples • Product sales data • If a person P bought product A, find friends P who bought A and also bought other products. Make recommendation based on that. • Social Network • Yelp like application, find me friends who like the same restaurant that I like and also recommend other restaurants. Friends location has. Bought is. Looking sells. At Product
Recommending Songs: Approach User 1 Song 1 User 2 Song 2 User 3 Song 3 User 4 Song 4 User 5 Song 5
The Million Song Dataset Data courtesy of the Million Song Dataset (https: //labrosa. ee. columbia. edu/millionsong/) and the Echo Nest Taste Profile Subset (https: //labrosa. ee. columbia. edu/millionsong/tasteprofile ) Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12 th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available at: http: //labrosa. ee. columbia. edu/millionsong/tasteprofile
Implementation using SQL Graph Unique. User (node table) Unique. Song Likes (node table) (edge table) CREATE TABLE Unique. User (User. Id VARCHAR(80)) AS NODE CREATE TABLE Likes (Listen. Count BIGINT)) AS EDGE CREATE TABLE Unique. Song (Song. Id VARCHAR(50) , Song. Title VARCHAR(500) , Artist. Name VARCHAR(500)) AS NODE
Demo SQL Graph (Special thanks to Arvind Shyamsundar from the SQLCAT team!) © SQLintersection. All rights reserved. http: //www. SQLintersection. com
Storing edge and nodes in tables • Natural choice for us (no need to re-invent) • Storing edge data in a separate table allows us to benefit from the query optimizer, which can pick the optimal join strategy for large queries • Depending on the complexity of query and data statistics, the optimizer can pick a nested loop join, hash join, or other join strategies
$node_id § Node tables have an implicit $node_id column created which uniquely identifies a given node in the database § Combination of object_id of that node table and an internally generated bigint value § When the $node_id column is selected, a computed value in the form of a JSON string is displayed § $node_id is a pseudo column, that maps to an internal name with hex string in it. When you select $node_id from the table, the column name will appear as $node_id_<hex_string>
Edge Columns
Metadata sys. tables • is_node and is_edge sys. columns • graph_type and graph_type_desc • Shows graph column types like “from ID” and “to ID”
Indexing Nodes • We create a default unique, non-clustered index on $node_id by default Edges • We automatically create a unique non-clustered index on $edge_id • For OLTP scenarios, we recommend that users create indexes on these columns ($from_id, $to_id) for faster lookups in the direction of the edge
DDL (beyond creating the table) o. ALTER TABLE o. Supported for user-defined columns, indexes or constraints o. CREATE INDEX o. Supported on user-defined and pseudo-columns for node and edge tables, including CCI and NCCI o. DROP TABLE o. Supported, but we don’t automatically cascade deletion of edges or nodes
Supported table types § Regular tables are supported § Not supported for nodes or edge tables (for this release): o. Memory-optimized o. Temporary tables (global and local) o. Temporal tables o. Stretch tables o. External tables (Poly. Base)
Can I alter an existing table into a node or edge table? § In the first release, ALTER TABLE to convert an existing relational table into a node or edge table is not supported § Instead, users can create a node table and use INSERT INTO … SELECT FROM to populate data into the node table § To populate an edge table from an existing table, proper $from_id and $to_id values must be obtained from the node tables.
INSERTs into Graph tables § Inserting into a node is same as inserting into any relational table; the values for the $node_id column are automatically generated. § For edge table, users must provide values for $from_id and $to_id columns
DELETE, UPDATE, MERGE DELETE • Works the same as regular table • No constraints to check for edges pointing to deleted nodes, or nodes used with edges (no cascade) UPDATE • You can update user-defined columns – but not $from_id, $to_id MERGE • Not supported in this version
How can I ingest unstructured data? • Since we are storing data in tables, users must know the schema at the time of creation • Users can always add new types of nodes or edges to their schema • If you want to modify an existing node or edge table, they can use ALTER TABLE to add or delete attributes • If you expect any unknown attributes in your schema, you could either use sparse columns or create a column to hold JSON strings and use that as a placeholder for unknown attributes
Transitive Closure • How do I find a node connected to me and arbitrary number of hops away, in my graph? • The ability to navigate through a combination of nodes and edges, an arbitrary number of times, is called transitive closure • For example, find all the people connected to me through three levels of indirections or find the employee chain for a given employee in an organization • Transitive closure is not supported in the first release and we’re going to publish alternative working examples in the meantime
Graph Polymorphism • How do I find ANY Node connected to me in my graph? • The ability to find any type of node connected to a given node in a graph is called polymorphism • SQL graph does not support polymorphism in the first release • A possible workaround is to write queries with UNION clause over a known set of node and edge types
MATCH § Specifies the search condition for a graph § Can only be used with graph node and edge tables § Usable in SELECT statement as part of WHERE clause § MATCH (<graph_search_pattern>) o. Uses ASCII art syntax to traverse a path in the graph o. One node to another via an edge in the direction of the arrow provided
MATCH § Node names or aliases appear at the two ends of the arrow o. MATCH(Unique. User-(Likes. This)->My. Song) § Edge names or aliases are provided inside parentheses o. MATCH(Unique. User-(Likes. This)->My. Song) § The arrow can go in either direction in the pattern o. MATCH(Similar. Song<-(Likes. Other)-Unique. User-(Likes. This)->My. Song)
MATCH § The node names inside MATCH can be repeated § An edge name cannot be repeated inside MATCH o. Use an alias for the edge if you want to use to form a new path § An edge can point in either direction, but it must have an explicit direction
MATCH § MATCH can be combined with other expressions using AND in the WHERE clause § OR and NOT operators are not supported in the MATCH pattern o. But you can have a sub-query using EXISTS or NOT EXISTS part of the overall query
SQL Graph Customer Scenarios Fraud detection for online gaming • Look at devices, IPs, logins and connect to gaming activity Sales Lead Management • Traverse a hierarchy of child accounts, and given an opportunity see who responded to specific emails and promotions Product hierarchies, Bill of Materials management • Find the hierarchy for a given product or the hierarchies for a set of products • Find all products that contain a given product according to the Bo. M hierarchy
(Revisit) Ingredient Recall Scenario
(Revisit) Understanding the impact
Questions? Don’t forget to complete an online evaluation! Graph Database Processing with Azure SQL DB and SQL Server 2017 Your evaluation helps organizers build better conferences and helps speakers improve their sessions. Thank you!
Save the Date Oct 30 -Nov 2, 2017 We’re back in Vegas baby! www. SQLintersection. com
- Slides: 43