High Level Parallelisation of SUSSIX H Renshall IT

  • Slides: 5
Download presentation
High Level Parallelisation of SUSSIX H. Renshall, IT Dept, Jan 2011 12/25/2021 Parallelisation of

High Level Parallelisation of SUSSIX H. Renshall, IT Dept, Jan 2011 12/25/2021 Parallelisation of SUSSIX 1

Why Parallelise ? Start with simple case from Ewen on lxplus g 77 opt=0

Why Parallelise ? Start with simple case from Ewen on lxplus g 77 opt=0 in 0. 25 secs g 77 opt=2 in 0. 23 secs g 77 opt=3 in 0. 19 secs all consistent so always use OPT=3 g 77 opt=3 –pg –g then gprof on case of 168 bpms, 2048 turns, 12 lines NARM=20 takes 75% of 15. 8 secs in CALCR NARM=160 takes 85% of 130 secs in CALCR Change to ifort 64 -bit OPT=3 on pcbe 13798 168 bpm case takes 14. 4 secs and 14. 9 secs with CALCR expanded inline – ifort OPT=3 includes inline option. Try using openmp option of ifort. Cannot parallelise CALCR due to algorithm which is already very efficient. Obvious higher level parallelisation is over the individual bpm files which are all independent. 12/25/2021 Parallelisation of SUSSIX 2

How Parallelised (1) The only executable code changes are in the main program. The

How Parallelised (1) The only executable code changes are in the main program. The rest are non-executable apart from ordres where all 'write (30, ' were changed to 'write (3000+n, '. The core code changes in sussix. s are: 00435 !$OMP PARALLEL DO PRIVATE(n, iunit, filename, nturn) 00436 !$OMP& SHARED (isix, ntot, iana, iconv, nt 1, nt 2, narm, istune, etune, tunex, 00437 !$OMP& tuney, tunez, nsus, idam, ntwix, ir, imeth, nrc, eps, nline, 00438 !$OMP& lr, mr, kr, idamx, ifin, isme, iusme, inv, icf, iicf) 00439 do n=1, ntot 00440 ! First open lines output file per bpm to be written in ORDRES 00441 filename='lin. ' 00442 write(filename(5: 8), '(i 4. 4)') n 00443 open(3000+n, file=filename, form='formatted', status='unknown') 00444 iunit=92 + n 445 filename='bpm. ‘ 446 00446 write(filename(5: 8), '(i 4. 4)') n 447 00447 448 open(iunit, file=filename, form='formatted', status='unknown') 449 00448 call datspe(iunit, idam, ir, nt 1, nt 2, nturn, imeth, narm, iana) 450 00449 call ordres(eps, narm, nrc, idam, n, nturn, 451 & 452 00450 &-tunex, -tuney, -tunez, istune, etune) 453 00451 close(iunit) 454 00452 close (3000+n) 455 HRR 456 00453 enddo 457 00454 !$OMP END PARALLEL DO 12/25/2021 Parallelisation of SUSSIX 3

How Parallelised (2) 00455 ! Now read the lin. files in order and rewrite

How Parallelised (2) 00455 ! Now read the lin. files in order and rewrite them to the lines file 00456 do n= 1, ntot 00457 filename='lin. ' 00458 write(filename(5: 8), '(i 4. 4)') n 00459 open(3000+n, file=filename, form='formatted', status='unknown') 00460 do k=1, maxiter 00461 read (3000+n, '(A)', end=990, eor=980, advance='NO', SIZE=nf) ch 00462 980 write(30, '(A)') ch(1: nf) 00463 enddo 00464 990 close (3000+n) 00465 enddo 12/25/2021 Parallelisation of SUSSIX 4

Result First a 1 -thread run to cache the 100 bpms in afs: user

Result First a 1 -thread run to cache the 100 bpms in afs: user system wall %cpu over the wall time 1 thread 189. 799 u 6. 090 s 3: 49. 47 85. 3% then: 1 thread 190. 414 u 5. 616 s 3: 24. 14 96. 0% 10 thread 195. 021 u 12. 699 s 0: 28. 74 722. 7% 16 thread 198. 530 u 18. 412 s 0: 24. 30 892. 7% 20 thread 200. 419 u 16. 578 s 0: 19. 65 1104. 2% 30 thread 209. 400 u 25. 780 s 0: 24. 92 943. 7% 40 thread 220. 526 u 27. 898 s 0: 24. 72 1004. 8% 48 thread forrtl: severe (41): insufficient virtual memory so diminishing returns with more threads slowing down from 20 on probably due to there being only 20 afs daemons available. 12/25/2021 Parallelisation of SUSSIX 5