Hybridizing SPARQL Queries and Graph Algorithms David Mizell
Hybridizing SPARQL Queries and Graph Algorithms David Mizell Cray Inc. , Austin, TX Graph Algorithms Building Blocks Workshop May 2014 1
What’s Urika? RDF triples database – memory-resident SPARQL query language Aimed at customers who Have large datasets Want to do graph analytics 2
What are RDF Triples? “Resource Data Framework” A data representation intended to be Somewhat self-defining Data items unique across the Internet Each triple represents an item of information Subject Predicate Object <http: //yarcdata. com/GABBexample/person#John. Gilbert> <http: //yarcdata. com/GABBexample/drives. Car> <http: //yarcdata. com/GABBexample/car. Type#Yugo> “John Gilbert drives a Yugo” 3
RDF Triples (2) They waste space compared to relational DB person car John Gilbert Yugo BUT they’re graph-oriented <http: //yarcdata. com/GABBexample/person#John. Gilbert> yd: car. Type#Yugo <http: //yarcdata. com/GABBexample/car. Type#Yugo> 4
What’s SPARQL? SPARQL Protocol And RDF Query Language Similar to SQL prefix yd: <http: //yarcdata. com/GABBexample/> SELECT ? car WHERE { yd: person#John. Gilbert yd: drives. Car ? car } car Yugo 5
Or prefix yd: <http: //yarcdata. com/GABBexample/> SELECT ? driver ? car WHERE { ? driver yd: drives. Car ? car ? driver a yd: University. Prof } driver John. Gilbert Andrew. Lumsdaine David. Bader car Yugo Studebaker AMC_Matador 6
Like SQL, it has FILTERs SELECT ? driver ? car WHERE { ? driver yd: drives. Car ? car ? driver a yd: University. Prof ? car yd: year. Built ? model. Year } FILTER ( ? model. Year > “ 1985 -0101 T 12: 00”^^xsd: date. Time ) driver car Plus other useful features like updates, etc. 7
Unlike SQL, Intense Joinery LUBM Query 9: PREFIX rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> PREFIX ub: <http: //www. lehigh. edu/~zhp 2/2004/0401/univbench. owl#> SELECT ? X, ? Y, ? Z WHERE { ? X rdf: type ub: Student. ? X ? Y rdf: type ub: Faculty. ? Z rdf: type ub: Course. ? X ub: advisor ? Y ub: teacher. Of ? Z ? X ub: takes. Course ? Z } 8
Typical Customer Reaction to SPARQL “Cool. Can you also do betweenness centrality on that? ” 9
SPARQL Almost Limited to Fixed-Length Query Patterns Steve “Nailgun” Reinhardt’s breadth-first search external server “Get neighbors of these vertices” iterative script w. SPARQL API Urika SPARQL query engine Set of vertices 1 0
What We’re Doing Extending SPARQL with “INVOKE” operator INVOKE <http: //yarcdata. com/graph. Algorithm. vertex. Betweenness> ( ) INVOKE is paired with SPARQL’s existing CONSTRUCT operator CONSTRUCT WHERE { yd: person#John. Gilbert ? p 1 ? o 1 ? p 2 ? o 2 ? p 3 ? o 3. } INVOKE <http: //yarcdata. com/graph. Algorithm. st_connectivity> ( yd: person#John. Gilbert, yd: car. Type#Ferrari ) We extended SPARQL so that you can nest a CONSTRUCT/INVOKE pair. 1 1
Nesting Example: k-point-five neighborhood SELECT ? vertex. ID ? edge. ID ? vertex 2 ID WHERE { CONSTRUCT { ? s 1 ? s 2 ? s 3. ? start. Vertex a <http: //yd. selected. Starting. Vertex>. } WHERE { { ? s 1 ? s 2 ? s 3. FILTER (!sameterm( ? s 2, <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> ) ) } UNION { VALUES ? start. Vertex { lub: Graduate. Student 30 lub: Graduate. Student 102 lub: Graduate. Student 68 lub: Graduate. Student 16 lub: Graduate. Student 5 } } } INVOKE yd: graph. Algorithm. kpointfive(1) PRODUCING ? vertex. ID ? edge. ID ? vertex 2 ID } 1 2
A Peek Under the Hood three-column “IRA” input graph algorithm expects S P O que ry Graph algorithm “wrapper” query engine three-column “IRA” graph algorithm from library vertex. ID edge. ID vertex 2 ID graph algorithm results 1 3
Future Directions VHLL for graph algorithms Maybe extend with some RDF access features New platform for Urika Likely to be commodity processor-based 1 4
In conclusion… 1 5
- Slides: 15