Geant V Adapting simulation to modern hardware Geant

  • Slides: 5
Download presentation
Geant. V – Adapting simulation to modern hardware Geant. V simulation Classical simulation Engineered

Geant. V – Adapting simulation to modern hardware Geant. V simulation Classical simulation Engineered to profit from all processing pipelines Flexible, but limited adaptability towards the full potential of current & future hardware • • • One track at a time Three stacks (leptons, γ, other) Single event transport Event-level parallelism (threading) Cache coherency – low Vectorization – low (scalar auto-vectorization) • • • Lots of tracks in flight Many baskets – 10 s to 1000 s Multi event transport Fine-grain parallelism + threads Cache coherency – high Vectorization – high (explicit multi-particle interfaces)

A future task approach of Geant. V Feeder task Reads from file a number

A future task approach of Geant. V Feeder task Reads from file a number of events. Invokes the concurrent basketizer service spawn inject particle Basketizer(s) event finished? concurrent service injects full baskets Garbage collector Forces partially filled baskets into the basket queue to boost concurrency Transport task may be further split into subtasks n w spa Transport task Transports one basket for t u one step outp tracks d e t input spor tran dequeue basket enqueue basket reuse tracks keeping locality spawn Scoring Basket queue This is a user task reading track info and creating ”hits” concurrent service command dump all your baskets spawn queue empty? I/O task inspect Flow control task event finished? queue empty? Write data (hits, digits, kinematics) on disk event finished? spawn Digitizer task This is a user task working on “hits” data

Some issues for migrating to tasks • Flow control • Proof of principle that

Some issues for migrating to tasks • Flow control • Proof of principle that it works • Thread ID integration • Now we have static threads with unique id’s, how to deal with this in task mode • Thread/task data ownership driven by id system • Avoiding task bloat • Spawning transport tasks per basket can create large overhead • Keeping transport task “living” for longer period? • Locality • Preventing the task system migrating arbitrarily threads • Specially when we move to NUMA awareness

Transport Scheduler RAM_1 • Fine grain MT preventing to scale to high number of

Transport Scheduler RAM_1 • Fine grain MT preventing to scale to high number of threads CPU 2 RAM_2 Scalability on many-core CPU 1 Process Rebasketizing 2 x Intel(R) Xeon(R) CPU E 5 -2630 v 3 @ 2. 40 GHz • Issue for many core architectures Process Transport CPU 2 RAM_2 Events queue Process Transport • Lightweight interaction RAM_1 CPU 1 Scheduler 1 Lock-free algorithm (memory polling) Algorithm using spinlocks Scheduler 2 • Implement new MP approach with common events queue as feeder

Transport Tracks Scheduler 1 Transport Tracks Global basketizer • 2 supported modes Tracks Transport

Transport Tracks Scheduler 1 Transport Tracks Global basketizer • 2 supported modes Tracks Transport Scheduler 2 Basketizer 2 • MPI/shared memory dispatch running one Geant. V process per quadrant (previous slide) • Single process spawning one scheduler per quadrant • Loose communication between NUMA nodes at basketizing step Basketizer 1 • One basketizer per quadrant Scheduler 0 Basketizer 3 • Replicate schedulers on NUMA clusters Basketizer 0 Future: NUMA aware Geant. V Tracks Transport Scheduler 3