A New Reachability Algorithm for Symmetric Multiprocessor Architecture
A New Reachability Algorithm for Symmetric Multi-processor Architecture D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin D. Dill, Stanford Formal Equivalence and Assertion-based Verification Workshop 2005
Outline n n n Standard Reachability Analysis Multithreaded Reachability in SMP machines Engineering Issues Results Conclusion and Future Work
Related Work n Parallel Reachability Analysis: – – – Stern and Dill [CAV, 97] Stornetta and Brewer [DAC, 96] Yang, Hallaron [97] Heyman, Geist, Grumberg, Schuster [CAV, 00] Garavel, Mateescu, Smarandache [SPIN, 01] Pixley, Havlicek [03]
Reachability using BDD [Burch et al. : 91] Partitioned Transition Relation … Tr 1 I … Tri Trn R 1 R 2 Least Fixed Point Ri Initial State Image computation
Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4
Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Communicate from 1 -> 2 Communicate from 1 -> 4 Communicate from 1 -> 3
Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Communicate from 2 -> 1 Communicate from 2 -> 3 Communicate from 2 -> 4 Similarly repeat for other partitions
Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Improvements: [Iyer et al. : 03] [Sahoo et al. : 04]
Motivation for Multi-threaded Approach n n n Scheduling Problem Increasing availability of powerful SMP machines Multi-threading is a way of achieving real parallelism in SMP machines
Time Multi-threaded Reachability [DAC 05] Naïve parallelization n Advantage: – Parallel speedup – Catch a bug faster than the sequential version n Problems: – Not much parallelism
Time Multi-threaded Reachability [DAC 05] Early Communication n Advantage: – Parallel speedup – Finishes the reachability analysis faster – Catches bug faster than the naive version n Problems: – Parallelism could be better
Time Multi-threaded Reachability [DAC 05] Early Communication and Partial Communication n Advantage: – Parallel speedup – Finishes the reachability analysis faster – Catches bug faster than the previous versions
Time Reachability in SMP Architecture n n We find the bugs faster ! Improved parallelism à Better parallel speedup
Engineering Issues n Thread-safe BDD library Deterministic behavior n Smart thread scheduling n
Sources of Non-determinism n Thread 1 Thread 2 n p = malloc (…) key = hash(p) if (p > p 1) … n n Extensive memory based optimizations Pointer comparisons Hashing based on memory address Solutions: – Deterministic Hashing – Deterministic comparisons
Sources of Non-determinism Thread 1 n Thread synchronization n Solutions Thread 2 Image #n+1 – Synchronization based on deterministic count n Number of ITE operations n Number of Sift operations
Smart Thread Scheduling CPU 1 Thread Cache 1 CPU 2 Cache 2 n n 0 x 07 ffd 0 n Lookup 0 x 07 ffd 0 Cach emiss n Each processor has its own cache Thread is assigned to a processor The cache fills up with the thread’s memory usage. The same thread assigned to a different processor after sometime A large number of unnecessary cache miss when the thread use its previously used memory locations Solutions: – Bind thread to a processor – Leads to suboptimal throughput n If the number of threads exceeds the number of processors
BDD Performance : CUDD Vs New BDD Statistics after Reachability Analysis (Static Order) Ckts P/ F #i m g #node s bpb F 10 eight P fru 32 CUDD New Mem (MB) Cache hits Cache collision Time Mem Cache hits Cache collision Time 1. 8 M 50 M 41. 0% 90. 4% 18. 6 61 M 41. 0% 88. 2% 26. 3 47 79 K 6. 1 M 42. 9% 26. 2% 0. 8 7. 5 M 42. 9% 26. 2% 1. 5 F 2 8 K 9. 2 M 34. 0% 28. 4% 7. 9 10. 9 M 34. 0% 28. 9% 8. 9 idu 32 F 1 36 K 6. 6 M 28. 8% 5. 0% 4. 2 7. 8 M 28. 7% 7. 7% 4. 5 usbphy P 1 90 K 6. 4 M 37. 7% 16. 6% 0. 7 7. 8 M 37. 7% 17. 1% 0. 7
BDD Performance : CUDD Vs New
Performance : Non-deterministic Vs Deterministic Verification Time in Sec Ckts Non-deterministic Deterministic c 1 T/O 227 c 2 962 917 c 3 809 62 c 4 903 161 d 1 13 13 d 2 24 30 d 3 84 100 d 4 30 38 d 5 13 37
Performance: Cache or Parallelism Verification Time in Sec Ckts Uniprocessor Sequential In 8 -way SMP Parallel In 8 -way SMP c 1 1570 286 227 d 1 125 13 13 d 2 180 39 30 d 3 295 130 100 d 4 176 60 38
Results on Industrial Circuits Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel 8 CPUs Early Comm + Partial Comm Naïve Early Comm 1 CPU 8 CPUs c 1 371 T/O T/O 286 227 c 2 3346 1789 1564 93 917 c 3 2540 T/O T/O 228 62 c 4 2236 2084 1174 161 509 161 d 1 6 T/O 13 13 13 d 2 10 11 13 45 39 30 d 3 15 21 23 100 130 100 d 4 11 T/O 39 60 38 d 5 12 16 15 34 37 37
Results on public benchmarks Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel 8 CPUs Early Comm + Partial Comm Naïve Early Comm 1 CPU 8 CPUs spprod 891 61 53 93 510 440 am 2910 T/O 281 122 204 386 356 palu 273 4 9 8 9 9 S 1269 b-1 3635 T/O 59 72 60 S 1269 b-5 2287 T/O 55 67 55 blackjck T/O 1213 470 340 98 70
Results : Gantt charts Real execution traces from our multi-threaded reachability program
Conclusion and Future Work n n Parallelize the Reachability Multi-threaded Reachability n Better results Deterministic behavior n Future Work n – – Improve the parallelism further Study cache behavior
- Slides: 25