Preprocessing Compute Post Proc Compute Analyze XML Raw

  • Slides: 8
Download presentation
Preprocessing Compute Post Proc. Compute Analyze <</ />> </> XML Raw Data ETL Initial

Preprocessing Compute Post Proc. Compute Analyze <</ />> </> XML Raw Data ETL Initial Graph Slice Subgraph Page. Rank Repeat Top Users

Graph. X

Graph. X

Raw Wikipedia Hyperlinks Page. Rank Top 20 Pages <</ />> </> HDFS XML Spark

Raw Wikipedia Hyperlinks Page. Rank Top 20 Pages <</ />> </> HDFS XML Spark Preprocess HDFS Compute Naïve Spark Post. 1492 Giraph + Spark 605 342 Graph. X Graph. Lab + Spark 375 0 200 400 600 800 1000 1200 1400 1600 Total Runtime (in Seconds)

Property Graph rxin stu. jgonzal, pst. doc. Id Property (V) 3 (rxin, student) 7

Property Graph rxin stu. jgonzal, pst. doc. Id Property (V) 3 (rxin, student) 7 (jgonzal, postdoc) 5 (franklin, professor) 2 (istoica, professor) franklin, prof. Colleague Collab. 7 5 PI 3 Advisor Vertex Table 2 istoica prof. Edge Table Src. Id Dst. Id Property (E) 3 7 Collaborator 5 3 Advisor 2 5 Colleague 5 7 PI

Data-Parallel Graph-Parallel Pregel Table Property Graph Row Result Row

Data-Parallel Graph-Parallel Pregel Table Property Graph Row Result Row

Hyperlinks Raw Wikipedia <</ />> </> Page. Rank Title PR Text Table Title Body

Hyperlinks Raw Wikipedia <</ />> </> Page. Rank Title PR Text Table Title Body Top 20 Pages Term-Doc Graph Topic Model (LDA) Word Topics XML Word Topic Discussion Table User Disc. Editor Graph Community Detection User Community Topic User Com. Topic Com.

Vertex Table (RDD) Property Graph B C A Part. 1 D 2 D Vertex

Vertex Table (RDD) Property Graph B C A Part. 1 D 2 D Vertex Cut Heuristic A F D E Routing Table (RDD) Part. 2 Edge Table (RDD) A B A C A A 1 2 B B 1 B C C C 1 C D A E A F E D E F D D 1 2 E E 2 F F 2

Edge Cut Vertex Cut

Edge Cut Vertex Cut