Galois Performance Mario MendezLojo Donald Nguyen Overview Galois
- Slides: 36
Galois Performance Mario Mendez-Lojo Donald Nguyen
Overview • Galois system is a test bed to explore opts – Safe but not fast out of the box • Important optimizations – Select least transactional overhead – Select right scheduling – Select appropriate data structure • Quantify optimizations on applications 2
Algorithms general graph topology grid tree morph irregular algorithms operator 1. Barnes-Hut 2. Delaunay Mesh Refinement 3. Preflow-push local computation reader ordering unordered 3
Methodology Threads Time Serial Idle GC Compute • Abort Ratio: Aborted It/Total it • GC options • • • Use. Parallel. GC Use. Parallel. Old. GC New. Ratio=1 4
Terms • Base – Default scheduling, Default graph • Serial – Galois classes => No concurrency control classes • Speedup – Best mean performance of a serial variant • Throughput – # Serial Iterations / time 5
Numbers • Runtime – Last of 5 runs in same VM – Ignore time to read and construct initial graph • Other statistics – Last of 5 runs 6
Test Environment • • 2 x Xeon X 5570 (4 core, 2. 93 GHz) Java 1. 6. 0_0 -b 11 Linux 2. 6. 24 -27 x 86_64 20 GB heap size 7
BARNES-HUT Most Distant Galaxy Candidates in the Hubble Ultra Deep Field 8
Barnes-Hut • N-body algorithm – Oct-tree acceleration structure – Serial • Tree build, center of mass, particle update – Parallel • Force computation • Structure – Reader on tree • Variants – Splash 2, Reader Galois 9
Reader Optimization child = octree. get. Neighbor(nn, 1); child = octree. get. Neighbor(nn, 1, Method. Flag. NONE); 10
Para. Meter Profile 11
Barnes-Hut Results Best serial: base Serial time: 10271 ms Best // time: 1553 ms Best speedup: 6. 6 X 100, 000 points, 1 time step 12
Barnes-Hut Results Best serial: base Serial time: 10271 ms Best // time: 1553 ms Best speedup: 6. 6 X 100, 000 points, 1 time step 13
Barnes-Hut Scalability 14
15
DELAUNAY MESH REFINEMENT 16
Delaunay Mesh Refinement • Refine “bad” triangles – Maintained in worklist • Structure – Cautious operator on graph • Variants – Flag optimized, locallifo base: Priority. default. Order() local lifo: Priority. first(Chunked. FIFO. class). then. Locally(LIFO. class) 17
Cautious Optimization • No need to save undo info • Only check conflicts up to first write mesh. contains(item); . . . mesh. remove(pre. Nodes. get(i)); . . . mesh. add(node); mesh. contains(item, Method. Flag. CHECK_CONFLICT); . . . mesh. remove(pre. Nodes. get(i), Method. Flag. NONE); . . . mesh. add(node, Method. Flag. NONE);
LIFO Optimization Galois. Runtime. foreach(. . . , Priority. default. Order()); Galois. Runtime. foreach(. . . , Priority. first(Chunked. FIFO. class). then. Locally(LIFO. class)); 19
Para. Meter Profile 20
DMR Results Best serial: locallifo. flagopt Serial time: 17002 ms Best // time: 3745 ms Best speedup: 4. 5 X 0. 5 M triangles, 0. 25 M bad triangles 21
22
PREFLOW-PUSH 23
Preflow-push • Max-flow algorithm – Nodes push flow downhill • Structure – Cautious, local computation • Variants – Flag optimized, local computation graph base (discharge): Priority. first(Bucketed. class, num. Height+1, false, indexer). then(FIFO. class) base (relabel): Priority. first(Chunked. FIFO. class, 8)
Local Computation Optimization graph =. . . b = new Local. Computation. Graph. Object. Graph. Builder(); graph = b. from(graph). create() 25
Para. Meter Profile 26
Preflow-push Results C: 11450 ms Java: 30234 ms Best serial: lc. flagopt Serial time: 57121 ms Best // time: 18242 ms Best speedup: 3. 1 X From challenge problem (genmf-wide) 14 linearly connected grids(194 x 194), 526, 904 nodes, 2, 586, 020 edges http: //avglab. com/andrew/CATS/maxflow_synthetic. htm 27
Preflow-push Scalability 28
29
What performance did we expect? Threads Time Measured Indirectly Error //Compute Serial GC Idle Miss-Speculation Synchronization, … 30
What performance did we expect? • Naïve: r(x) = t 1 / x • Amdahl: r(x) = tp / x + ts t 1 = tp + ts ts = tidle + tgc+ tserial • Simple: r(x) = (tp (ix / i 1)) / x + ts 31
Barnes-Hut 32
Delaunay Mesh Refinement 33
Preflow-push 34
Summary • Many profitable optimizations – Selecting among method flags, worklists, graph variants • Open topics – Automation – Static, dynamic and performance analysis – Efficient ordered algorithms 35
36
- Nguyen tan khoi nguyen
- Galois
- Minimal polynomial
- Treillis de galois
- Cmpe 220
- Message authentication is not concerned with
- Performance levels
- Performance appraisal process
- 2018 jcids manual
- Donald geesaman
- Donald philips
- Norman’s model of interaction
- Donald flemming
- Robert c. ronstadt entrepreneurship
- Donald gray triplett
- Maturidade vocacional donald super
- Raymond boyce and donald chamberlin
- John crites model of career counseling three factors
- Iaiao farm
- Hematuria
- Donald geesaman
- Neale donald walsch pensées quotidiennes
- "donald bren" -"professor"
- Chromosomal abnormalities miscarriage
- The green banana donald batchelder
- Fcc wigner seitz cell
- Average size of cervix
- Chris argyris y donald schon
- Ayonna blue donald
- 2nd amendment text
- Donald zolan
- Mc donald haiti
- Meichenbaum theory
- Life career rainbow
- Donald norman la caffettiera del masochista
- Donald broadbent theory
- Kelly jarosz