Graph SDH A General Graph Sampling Framework with

  • Slides: 16
Download presentation
Graph. SDH: A General Graph Sampling Framework with Distribution and Hierarchy Jingbo Hu, Guohao

Graph. SDH: A General Graph Sampling Framework with Distribution and Hierarchy Jingbo Hu, Guohao Dai, Yu Wang, Huazhong Yang Department of Electronic Engineering, BNRist Tsinghua University, Beijing, China 2020/9/22 1

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results l Conclusion 2

Background Large-scale Graphs Social Network Biological Network Recommendation System limited by the long processing

Background Large-scale Graphs Social Network Biological Network Recommendation System limited by the long processing time effective way Graph sampling [1] [2] theoretical analysis related lack to graph algorithm models Graph. SDH: a general large-scale graph sampling framework based on the vertex-centric graph model [1] Gao, R. , Xu, H. , Hu, P. , & Lau, W. C. . “Accelerating graph mining algorithms via uniform random edge sampling. ” 2016 IEEE International Conference on Communications (ICC). IEEE, pp. 1 -6, 2016. [2] Riondato, Matteo, and Evgenios M. Kornaropoulos. “Fast approximation of betweenness centrality through sampling. ” Data Mining and Knowledge Discovery 30(2), pp. 438 -475, 2016. 3

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results l Conclusion 4

The Workflow of Graph. SDH graph algorithm & metric corresponding sampling probability & optimization

The Workflow of Graph. SDH graph algorithm & metric corresponding sampling probability & optimization strategy hierarchical optimization scheme 5

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results l Conclusion 6

Methodology 1. Variance Reduction Sampling Probability g: transformation function intermediate variable I(k)(u): the value

Methodology 1. Variance Reduction Sampling Probability g: transformation function intermediate variable I(k)(u): the value of vertex u in the k-th iteration A: adjacency matrix f(u, v): mapping function related to a specific algorithm E(I’(k+1) )、 Var(I’(k+1) ) Vertex Sampling (VS) Vertex Sampling with Neighbourhood (VSN) minimum Edge Sampling (ES) Traversal Based Sampling (TBS) 7

Methodology 2. Hierarchical Optimization Scheme sampling strategy is more suitable for two situations the

Methodology 2. Hierarchical Optimization Scheme sampling strategy is more suitable for two situations the stage of quickly updating the vertex value k k+1 the stage with too much redundant information × I(k) √ I(k+2) Error boundary × I(k+1) √ error no error 8

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results l Conclusion 9

Experimental Results • A Case Study of Page. Rank (Sampling Probability) MAP has reached

Experimental Results • A Case Study of Page. Rank (Sampling Probability) MAP has reached more than 95% even at 10% sampling ratio (sampling 10% edges of the original graph) Stratified sampling is better than random sampling, and the accuracy is related to stratified parameter m. The normalized Page. Rank values are very close under the sampling fraction of 30%. MAP: Mean Average Precision. Regard the first 1000 vertices as important ones. MRE: mean relative error between the sampled graph and the original graph 10

Experimental Results • Results of the Hierarchical Optimization Scheme RFS: sampling for the first

Experimental Results • Results of the Hierarchical Optimization Scheme RFS: sampling for the first 10 iterations RMS: sampling for the 11 th-20 th iterations RTS: sampling for the last 10 iterations RFS is the best one redundant information Updated Ratio improve the accuracy of the graph algorithm. 11

Experimental Results • Generality Evaluation —— Different Algorithms l Reduce the MRE of Page.

Experimental Results • Generality Evaluation —— Different Algorithms l Reduce the MRE of Page. Rank by about 17% l Reduce the MRE of ALS by about 95% l Increase the correctly updated values ratio of BFS by about 75% SS+P: stratified sampling combined with the hierarchical optimization scheme l Increase the normalized modularity value of LPA by about 8% 12

Experimental Results • Generality Evaluation —— Different Datasets SS+P: speed up twice, and the

Experimental Results • Generality Evaluation —— Different Datasets SS+P: speed up twice, and the MRE of Page. Rank can be reduced to less than 2% in different datasets. 13

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results

Outline l Background l The Workflow of Graph. SDH l Methodology l Experimental Results l Conclusion 14

Conclusion l We propose sampling approaches for variance reduction. According to four common graph

Conclusion l We propose sampling approaches for variance reduction. According to four common graph sampling techniques, we strictly derive the optimal sampling probability in theory. l We propose a stratified sampling strategy to further improve the algorithm accuracy. We classify the vertices based on their degrees, and sample their neighbors in different scales. l We propose a hierarchical sampling optimization scheme. We apply sampling techniques to the stage of fast updating vertex values or the stage with a fair amount of redundant information. l We have carried out extensive experiments to prove the effectiveness and generality of Graph. SDH. 15

Thank you! 16

Thank you! 16