Graph Databases for SQL Server Professionals Stphane Frchette
Graph Databases for SQL Server Professionals Stéphane Fréchette
Who am I? My name is Stéphane Fréchette Data & Business Intelligence Solutions Architect | Consultant | Speaker | Big Data | No. SQL | Data Science | Microsoft SQL Server MVP. Drums, good food and fine wine. Founder I have a passion for architecting, designing and building solutions that matter. Twitter: @sfrechette Blog: stephanefrechette. com Email: stephanefrechette@ukubu. com 2 | 15 -05 -30 | SQLSaturday New York #380
Session Outline § § § § What is a Graph? What is Neo 4 j? Data Modeling – The Property Graph Cypher Query Language Importing Data… Use Cases Demos Resources 3 | 15 -05 -30 | SQLSaturday New York #380
What is a Graph? 4 | 15 -05 -30 | SQLSaturday New York #380
Are these Graphs? 5 | 15 -05 -30 | SQLSaturday New York #380
This is a Graph Node Relationship A Property Graph 6 | 15 -05 -30 | SQLSaturday New York #380
Organization Project Graph 7 | 15 -05 -30 | SQLSaturday New York #380
Twitter Social Graph 8 | 15 -05 -30 | SQLSaturday New York #380
What is Neo 4 j? An open-source graph database by Neo Technology. Neo 4 j stores data in nodes connected by directed, typed relationships with properties on both, also know as a Property Graph § Fully ACID compliant § Massively scalable, up to several billion nodes/relationships/properties § Highly-available, when distributed across multiple machines § Accessible by a convenient REST interface or an object-oriented Java API 9 | 15 -05 -30 | SQLSaturday New York #380
Data Modeling From SQL Server to Graph Property Graph 10 | 15 -05 -30 | SQLSaturday New York #380
Example: Meetup Data In SQL Server Member ID Member 1 Daniel 2 Stephane 3 John 4 Randy 11 | 15 -05 -30 | Meetup. Organizer Meetup. Member Meetup Member. ID Meetup. ID ID Name 2 1 3 1 1 1 2 3 2 Ottawa SQL Server User Group 3 3 4 2 2 Ottawa Java. Script 2 4 4 4 3 Ottawa Visio User Group 3 5 1 5 4 Ottawa Tableau User Group 5 Dirty Dancing Ottawa SQLSaturday New York #380
Example: Meetup Data In a Graph name: ‘Ottawa Visio User Group’ name: ‘Ottawa SQL Server User Group’ name: ‘Stephane’ ER IS_MEM BER ANIZ IS_ORG Meetup Member name: ‘John’ R ANIZE G IS_OR IS_MEMBER IZER IS_ORGANIZER name: ‘Randy’ name: ‘Dirty Dancing Ottawa’ IS_MEMBE IS IS_ORGANIZER _M IS_MEMBER EM BE R R name: ‘Ottawa Tableau User Group’ name: ‘Ottawa Java. Script’ 12 | 15 -05 -30 | SQLSaturday New York #380 name: ‘Daniel’
Cypher Query Language Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store § § § Pattern-matching Declarative: what to retrieve, not how to retrieve it Inspired from other known Language (SQL, SPARQL, Haskell, Python) Aggregation, Ordering, Limit Update the Graph 13 | 15 -05 -30 | SQLSaturday New York #380
Cypher and T-SQL Cypher also has a number of keywords that have a direct equivalence with SQL which makes it a curiously familiar language § § § § WHERE ORDER BY LIMIT SUM, COUNT, STDEVP, MIN, MAX etc… LTRIM, UPPER, LOWER, REPLACE, LEFT, RIGHT, SUBSTRING DISTINCT CASE (SQL Server Pros) – [: WILL_LOVE] -> (Cypher) 14 | 15 -05 -30 | SQLSaturday New York #380
Cypher - Meetup 15 | 15 -05 -30 | SQLSaturday New York #380
Neo 4 j Browser 16 | 15 -05 -30 | SQLSaturday New York #380
Demo (let’s query some data…) 17 | 15 -05 -30 | SQLSaturday New York #380
Importing Data… Some important considerations… Different import scenarios § Dataset size: 1000 s, 10000000 s § Dataset format (source): Database, File (CSV, Spreadsheet, Graph. ML, Geoff), Service, Other § Import type: Initial Bulk Load, Incremental Load, Initial Bulk Load + Incremental Load Different import tools § § § Spreadsheet based Neo 4 j-shell based: (Cypher, neo 4 j-shell-tools, Cypher LOAD CSV) Command-line based: Batch Importer Neo 4 j Brower based ETL Tools: (Talend, Mulesoft, Pentaho Kettle) Custom software: (Java API, REST API, Spring Data Neo 4 j) 18 | 15 -05 -30 | SQLSaturday New York #380
Many different mappings Import Scenarios Import Tools Not always clear what you should be using Depends on your skillsets, dataset size… (lots of other stuff) Choose wisely! 19 | 15 -05 -30 | SQLSaturday New York #380
Demos (importing data!) 20 | 15 -05 -30 | SQLSaturday New York #380
The Sample Dataset 21 | 15 -05 -30 | SQLSaturday New York #380
Importing using Spreadsheets Very small size datasets < 1000, easy to use Format data in spreadsheet 22 | 15 -05 -30 | SQLSaturday New York #380 Generate Cypher statements with formulas Copy and Execute Cypher in Neo 4 j browser
Importing using Spreadsheets 23 | 15 -05 -30 | SQLSaturday New York #380
Importing using neo 4 j-shell-tools Small to medium size datasets https: //github. com/jexp/neo 4 j-shell-tools Format data in CSV files 24 | 15 -05 -30 | SQLSaturday New York #380 Create importcypher commands for neo 4 j-shell-tools Execute commands from neo 4 j-shell
Importing using neo 4 j-shell-tools 25 | 15 -05 -30 | SQLSaturday New York #380
Importing using LOAD CSV Native Cypher Format data in CSV files 26 | 15 -05 -30 | SQLSaturday New York #380 Create “LOAD CSV” commands Execute command from neo 4 j-shell or browser
Importing using LOAD CSV Sample 27 | 15 -05 -30 | SQLSaturday New York #380
Importing using Batch Importer Non-transactional import, suited for very large datasets Format data in TSV files 28 | 15 -05 -30 | Execute Batch Import command SQLSaturday New York #380 Copy store files to Neo 4 j Server directory Start Neo 4 j Server with generated store files
Adventure. Works http: //stephanefrechette. com/importing-adventureworks-data-into-neo 4 j/ https: //github. com/sfrechette/adventureworks-neo 4 j 29 | 15 -05 -30 | SQLSaturday New York #380
Use Cases Principal uses of Graph Database include: § Network and Data Center Management (Queries: Impact Analysis, Root Cause Analysis, Quality-of-Service Mapping, Asset Management) § Identity and Access Management (Queries : Access Management, Interconnected Group Organization, Provenance) § Social Master Data Management (Queries : Cross Reference Business Objects, Organizational Hierarchies, Data Ownership) http: //neo 4 j. com/use-cases/ 30 | 15 -05 -30 | Real-Time Recommendations (Queries : Product, Social, Service, and Professional Recommendations) § Fraud Detection (Queries : First-Party Bank Fraud, Insurance Fraud, E-commerce Fraud) (Queries : Friend Recommendations, Sharing & Collaboration, Influencer Analysis) § § SQLSaturday New York #380 § Digital Asset Management (Queries : Access Management, Cataloging, Work Flow Processes)
Summary (graphs)-[: ARE]->(everywhere) 31 | 15 -05 -30 | SQLSaturday New York #380
Resources § § § § Neo Technology http: //neo 4 j. com Neo 4 j on Vimeo http: //vimeo. com/neo 4 j Neo 4 j on Slide. Share http: //www. slideshare. net/neo 4 j Neo 4 j on Github https: //github. com/neo 4 j Neo 4 j Cypher Cheat Sheet http: //docs. neo 4 j. org/refcard/2. 1/ Neo 4 j Graph Database as a Service http: //www. graphenedb. com/ Linkurious – The easiest way to explore graph databases http: //linkurio. us/ Key. Lines- Visualize dynamic networks http: //keylines. com/ Experiments with NEO 4 J: Using a graph database as a SQL Server metadata hub http: //bit. ly/V 2 Prx. N Kenny Bastani http: //www. kennybastani. com/ Rik Van Bruggen http: //blog. bruggen. com/ Max de Marzi http: //maxdemarzi. com/ Better Software Development http: //jexp. de/blog/ Graph Databases (Free Book) http: //graphdatabases. com/ 32 | 15 -05 -30 | SQLSaturday New York #380 § § § Neo 4 j Graph. Gist http: //gist. neo 4 j. org/ Graph. Connect Conference http: //graphconnect. com/ Titan – Distributed Graph Database https: //thinkaurelius. github. io/titan/ Infinite. Graph http: //www. infinitegraph. com/ Orient. DB http: //www. orientechnologies. com/ Cayley by Google https: //github. com/google/cayley
What Questions Do You Have? 33 | 15 -05 -30 | SQLSaturday New York #380
Thank You For attending this session 34 | 15 -05 -30 | SQLSaturday New York #380
- Slides: 34