A taskbased implementation for Geant V Joel Fuentes
A task-based implementation for Geant. V Joel Fuentes Andrei Gheata 1
Introduction - Geant. V is a project that aims at developing a high performance detector simulation system integrating fast and full simulation. - This work corresponds to an implementation of a task approach for Geant. V using Intel Threading Building Blocks (TBB). - Intel TBB is a well-known library that provides different tools to manage tasks, concurrent data structures and parallel algorithms such as parallel for, pipeline, etc. - Previous implementation with TBB helped as a starting point. 2
Intel Threading Building Blocks (TBB) • Each thread has its own ready pool, which is a lists of tasks. • A task goes into each pool when it is allocated. • Each thread steals tasks from other pools when necessary. 3
TBB Task Scheduler Problem Solution Oversubscription One TBB thread per hardware thread Fair scheduling Non-preemptive unfair scheduling High overhead Programmer specifies tasks, not threads Load imbalance Work-stealing balances load Scalability Specify tasks and how to create them, rather than threads 4
task model of Geant. V n Feeder task Reads from file a number of events. Invokes the concurrent basketizer service spawn inject particle Basketizer(s) event finished? Transport task may be further split into subtasks concurrent service injects full baskets Garbage collector Forces partially filled baskets into the basket queue to boost concurrency Transport task ut outp tracks d orte p s n ra t enqueue basket w spa Transports one basket for one step reuse tracks keeping locality input dequeue basket spawn Scoring Basket queue This is a user task reading track info and creating ”hits” concurrent service command dump all your baskets spawn queue empty? I/O task inspect Flow control task event finished? queue empty? Write data (hits, digits, kinematics) on disk event finished? spawn Digitizer task This is a user task working on “hits” data 5
Implementation Tasks implemented: - Initial. Task - Flow. Controller. Task - Feeder. Task - Transport. Task • Tasks are described using C++ classes that contain the class tbb: task as the base class • Task operations are implemented in virtual method execute() • New parallel tasks are launched using the spawn(task *t) • Once a task is scheduled for execution by the runtime TBB library, the execute() method of the task is called in a non-preemptive manner, completing the execution of the task. Additional classes implemented: - Thread. Data - Task. Mgr. TBB All the new classes that represent the task-based approach were packed on a shared library called Geant_tbb. 6
How to run in TBB mode? - Install TBB. See https: //www. threadingbuildingblocks. org for details. - Build the Geant. V project for TBB mode by setting the parameters: $ -DUSE_TBB=ON -DTBBROOT=/your/path/to/TBB - Execute run. App in TBB mode by setting the flag: $. /run. App -i 1 7
Experimental Results Model name: Intel(R) Xeon(R) CPU E 5 -2630 v 3 @ 2. 40 GHz CPU(s): 32 On-line CPU(s) list: 0 -31 Thread(s) per core: 2 NUMA node 0 CPU(s): 0 -7, 16 -23 NUMA node 1 CPU(s): 8 -15, 24 -31 Static Threads mode run. App -e 10 -u 4 -i 1 -t 16 === Transported: 4870 primaries/4035173 tracks, total steps: 16358977, snext calls: 16358977, phys steps: 16208410, mag. field steps: 0, small steps: 939 bdr. crossings: 150567 RT=2. 03824 s, CP=31. 08 s TBB mode run. App -e 10 -u 4 -t 16 === Transported: 4870 primaries/4028541 tracks, total steps: 16331333, snext calls: 16331333, phys steps: 16179636, mag. field steps: 0, small steps: 916 bdr. crossings: 151697 RT=2. 64854 s, CP=54. 26 s nthreads=16 speed-up=20. 486797 efficiency=1. 280425 8
Experimental Results Model name: Intel(R) Xeon(R) CPU E 5 -2630 v 3 @ 2. 40 GHz CPU(s): 32 On-line CPU(s) list: 0 -31 Thread(s) per core: 2 NUMA node 0 CPU(s): 0 -7, 16 -23 NUMA node 1 CPU(s): 8 -15, 24 -31 Static Threads mode run. App -e 10 -u 4 -t 16 === Transported: 4870 primaries/4035173 tracks, total steps: 16358977, snext calls: 16358977, phys steps: 16208410, mag. field steps: 0, small steps: 939 bdr. crossings: 150567 RT=2. 03824 s, CP=31. 08 s TBB mode run. App -e 10 -u 4 -i 1 -t 16 === Transported: 4870 primaries/4028541 tracks, total steps: 16331333, snext calls: 16331333, phys steps: 16179636, mag. field steps: 0, small steps: 916 bdr. crossings: 151697 RT=2. 64854 s, CP=54. 26 s nthreads=16 speed-up=20. 486797 efficiency=1. 280425 9
Conclusions - A first implementation of a task-based approach for Geant. V using TBB was deployed. - The new approach provides more flexibility and connectivity to other task based multithreaded frameworks. - Some overhead for the TBB mode compares to the static thread approach has still to be fully understood and addressed. - possible reason: task initialization - There are still more tasks to implement: Scoring Task, I/O Task, Digitizer Task, and so on. - Directory with the Task-based implementation: https: //gitlab. cern. ch/Geant. V/geant/tree/master/vecprot_v 2_tbb - Report about this work submitted to GSo. C 16: http: //www. face. ubiobio. cl/~jfuentes/blog/geantv 10
Thank you 11
- Slides: 11