LargeScale Network Analysis with the Boost Graph Libraries
Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl. iu. edu 1
What are the BGLs? o A collection of libraries for computation on graphs/networks. n n n o Common design n o Graph data structures Graph algorithms Graph input/output Flexibility/customizability throughout Obsessed with performance Common interfaces throughout the collection All open source, freely available online Intro 2
The BGL Family o The Original (sequential) BGL o BGL-Python o The Parallel BGL o Parallel BGL-Python Intro 3
The Original BGL o The largest and most mature BGL n n n o ~7 years of research and development Many users, contributors outside of the OSL Steadily evolving Written in C++ n n n Generic Highly customizable Efficient (both storage and execution) Intro BGL 4
BGL: Graph Data Structures o Graphs: n n n o Adaptors: n n o adjacency_list: highly configurable with user-specified containers for vertices and edges adjacency_matrix compressed_sparse_row subgraphs, filtered graphs, reverse graphs LEDA and Stanford Graph. Base Or, use your own… Intro BGL 5
Original BGL: Algorithms o o o Searches (breadth-first, depth-first, A*) Single-source shortest paths (Dijkstra, Bellman. Ford, DAG) All-pairs shortest paths (Johnson, Floyd-Warshall) Minimum spanning tree (Kruskal, Prim) Components (connected, strongly connected, biconnected) Maximum cardinality matching Intro BGL o o o o o 6 Max-flow (Edmonds-Karp, push-relabel) Sparse matrix ordering (Cuthill-Mc. Kee, King, Sloan, minimum degree) Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun) Betweenness centrality Page. Rank Isomorphism Vertex coloring Transitive closure Dominator tree
Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL 7
Define a Graph Type o Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; }; o Determine the graph type: typedef adjacency_list< /*Edge. List. S=*/ vec. S, /*Vertex. List. S=*/ vec. S, /*Directed. S=*/ undirected. S, /*Vertex. Property=*/ Vertex, /*Edge. Property=*/ Edge> Graph; Intro BGL 8
Read in a Graph. Viz DOT File o Build an empty graph: Graph g; o Map vertex properties: dynamic_properties dyn; dyn. property(“node_id”, get(&Vertex: : name, g)); o Read in the Graph. Viz graph: ifstream in(“biconnected_components. dot”); read_graphviz(in, g, dyn); Intro BGL 9
Run Biconnected Components o Keep track of the articulation points: vector<Graph: : vertex_descriptor> art_points; o Compute biconnected components: biconnected_components (g, get(&Edge: : bicomponent, g), back_inserter(art_points)); Intro BGL 10
Output results o Attach bicomponent number to the “label” property of edges: dyn. property(“label”, get(&Edge: : bicomponent, g)); o Write results to another Graph. Viz file: ofstream out(“bc_out. dot”); write_graphviz(out, g, dyn); o Show articulation points: cout << “Articulation points: “; for (int i = 0; i < art_points. size(); ++i) { cout << g[art_points[i]]. name << ‘ ‘; } Intro BGL 11
Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL 12
Original BGL Summary o The original BGL is large, stable, efficient n n n o Lots of algorithms, graph types Peer-reviewed code with many users, nightly regression testing, etc. Performance comparable to FORTRAN. Who should use the BGL? n n Programmers comfortable with C++ Users with graph sizes from tens of vertices to millions of vertices Intro BGL 13
BGL-Python o Python is ideal for rapid prototyping: n n n o It’s a scripting language (no compiler) Dynamically typed means less typing for you Easy to use: you already know Python… BGL-Python provides access to the BGL from within Python n n Similar interfaces to C++ BGL Easier to learn than C++ Great for scripting, GUI applications help(bgl. dijkstra_shortest_paths) Intro BGL Python 14
Example: Biconnected Components import boost. graph as bgl # Pull in the BGL bindings g = bgl. Graph. read_graphviz("biconnected_components. dot") # Compute biconnected components and articulation points bicomponent = g. edge_property_map(‘int’) art_points = bgl. biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g. edge_properties[‘label’] = bicomponent g. write_graphviz("biconnected_components_out. dot") print "Articulation points: ", node_id = g. vertex_properties[‘node_id’] for v in art_points: print node_id[v], ’ ’, print "" Intro BGL Python 15
Wrapping the BGL in Python o BGL-Python is not a… n n o BGL-Python wraps the C++ BGL n n o o “port” reimplementation Python calls translate to C++ calls C++ can call back into Python Most of the speed of C++ Most of the flexibility of Python 16
Performance: Shortest Paths Intro BGL Python 17
BGL-Python Summary o BGL-Python is all about tradeoffs: n n n o More gradual learning curve Faster time-to-solution Lower performance Our typical approach: 1. 2. Prototype in Python to get your ideas down Port to C++ when performance matters Intro BGL Python 18
19
The Parallel BGL o A version of the C++ BGL for computational clusters n n o o Distributed memory for huge graphs Parallel processing for improved performance An active research project Closely related to the original BGL n Parallelizing BGL programs should be “easy” Intro BGL Python Parallel 20
Parallel BGL: Distributed Graphs distributed across 3 processors. A simple, directed graph… Intro BGL Python Parallel 21
Parallel Graph Algorithms o o o Breadth-first search Eager Dijkstra’s singlesource shortest paths Crauser et al. singlesource shortest paths Depth-first search Minimum spanning tree (Boruvka, Dehne & Götz) o o o o Intro BGL Python Parallel 22 Connected components Strongly connected components Biconnected components Page. Rank Graph coloring Fruchterman-Reingold layout Max-flow (Dinic’s)
Performance: Sparse graphs 23
Scalability (~547 k vertices/node) Up to 70 M Vertices 1 B Edges Small-World Graph 24
Performance vs. CGMgraph 96 k vertices 10 M edges Erdos-Renyi 17 x 30 x Intro BGL Python Parallel 25
Parallel BGL Summary o The Parallel BGL is built for huge graphs n n n o Parallel programming has a learning curve n n o Millions to hundreds of millions of nodes Distributed-memory parallel processing on clusters Future work will permit larger graphs… Parallel graph algorithms much harder to write Distributed graph manipulation can be tricky Parallel BGL is an active research library Intro BGL Python Parallel 26
Distributed Graph Layout Intro BGL Python Parallel 27
Parallel BGL in Python o Preliminary support for the Parallel BGL in Python n n o Several options for usage with MPI: n n o Just import boost. graph. distributed Similar interface to sequential BGL-Python Straight MPI: mpirun -np 2 python script. py py. MPI: allows interactive use of the interpreter Initially used to prototype our distributed Fruchterman-Reingold implementation. Intro BGL Python Parallel 28
Porting for Performance Intro BGL Python Parallel 29 Porting
Which BGL is Right for You? o o Is any BGL right for you? Depends on how large your networks are: n n n o Up to 1/2 million vertices, any BGL will do C++ BGL can push to a couple million vertices For tens of millions or larger, Parallel BGL only Other considerations: n n n You can prototype in Python, port to C++ Algorithm authors might prefer the original BGL Parallelism is very hard to manage Intro BGL Python Parallel 30 Porting
Conclusion o The Boost Graph Library family is a collection of full-featured graph libraries n n o All are flexible, customizable, efficient Easy to port from Python to C++ Can port from sequential to parallel Always growing, improving Is one of the BGLs right for you? n A typical “build or buy” decision Intro BGL Python Parallel 31 Porting Conclusion
For More Information… o o (Original) Boost Graph Library http: //www. boost. org/libs/graph/doc Parallel Boost Graph Library http: //www. osl. iu. edu/research/pbgl Python Bindings for (Parallel) BGL http: //www. osl. iu. edu/~dgregor/bgl-python Contact us! n n Douglas Gregor <dgregor@osl. iu. edu> Andrew Lumsdaine <lums@osl. iu. edu> Intro BGL Python Parallel 32 Porting Conclusion
Other BGL Variants o Quick. Graph (C#) http: //www. codeproject. com/cs/miscctrl/quickgraph. asp o Ruby Graph Library http: //rubyforge. org/projects/rgl/ o Rooster Graph (Scheme) http: //savannah. nongnu. org/projects/rgraph/ o RBGL (an R interface to the C++ BGL) http: //www. bioconductor. org/packages/bioc/1. 8/html/RBGL. html o Disclaimer: These are all separate projects. We do not maintain them. Intro BGL Python Parallel 33 Porting
Comparative Performance Intro BGL 34
- Slides: 34