Data Collection and Processing Thorsten Kollegger for the

  • Slides: 19
Download presentation
Data Collection and Processing Thorsten Kollegger for the ALICE Collaboration ALICE | LHCC Detector

Data Collection and Processing Thorsten Kollegger for the ALICE Collaboration ALICE | LHCC Detector Upgrade Review | 25. 09. 2012

Requirements Focus of ALICE upgrade on physics probes requiring high statistics: sample 10 nb-1

Requirements Focus of ALICE upgrade on physics probes requiring high statistics: sample 10 nb-1 Online System Requirements Sample full 50 k. Hz Pb-Pb interaction rate (current limit at ~500 Hz, factor 100 increase) ~1. 1 TByte/s detector readout However: • storage bandwidth limited to ~20 GByte/s • many physics probes have low S/B: classical trigger/event filter approach not efficient ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 2

Strategy Data reduction by (partial) online reconstruction and compression Store only reconstruction results, discard

Strategy Data reduction by (partial) online reconstruction and compression Store only reconstruction results, discard raw data • Demonstrated with TPC clustering since Pb-Pb 2011 • Optimized data structures for lossless compression • Algorithms designed to allow for offline reconstruction passes with improved calibrations Implies much tighter coupling between online and offline reconstruction software ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 3

Event Size Detector Event Size (MByte) After Zero Suppression After Data Compression TPC 20.

Event Size Detector Event Size (MByte) After Zero Suppression After Data Compression TPC 20. 0 1. 0 TRD 1. 6 0. 2 ITS 0. 8 0. 2 Others 0. 5 0. 25 Total 22. 9 1. 65 Expected data sizes for minimum bias Pb-Pb collisions at full LHC energy ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 4

TPC Data Reduction FEE HLT Data Format Data Reduction Factor Event Size (MByte) Raw

TPC Data Reduction FEE HLT Data Format Data Reduction Factor Event Size (MByte) Raw Data 1 700 Zero Suppression 35 20 Clustering & Compression 5 -7 ~3 Remove clusters not associated to relevant tracks 2 1. 5 Data format optimization 2 -3 <1 First steps up to clustering on FEE/FPNs (RORC FPGA) Further steps require full event reconstruction on EPNs, pattern recognition requires only coarse online calibration ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 5

TPC Data Reduction First compression steps used in production starting with the 2011 Pb+Pb

TPC Data Reduction First compression steps used in production starting with the 2011 Pb+Pb run HLT Pb+Pb 2011 Currently R&D towards moving selected calibration tasks and TPC seed finding to the HLT ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 6

Data Bandwidth Detector Input to Online System (GByte/s) Peak Output to Local Data Storage

Data Bandwidth Detector Input to Online System (GByte/s) Peak Output to Local Data Storage (GByte/s) Avg. Output to Computing Center (GByte/s) TPC 1000 50. 0 8. 0 TRD 81. 5 10. 0 1. 6 ITS 40 10. 0 1. 6 Others 25 12. 5 2. 0 1146. 5 82. 5 13. 2 Total LHC luminosity variation during fill and efficiency taken into account for average output to computing center. ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 7

Combined DAQ/HLT System 2 x 10 or 40 Gb/s ~ 2500 DDL 3 s

Combined DAQ/HLT System 2 x 10 or 40 Gb/s ~ 2500 DDL 3 s 10 Gb/s L 0 L 1 ITS RORC 3 FLP TPC RORC 3 FLP TRD RORC 3 FLP EMCal RORC 3 FLP PHOS FLP RORC 3 TOF RORC 3 FLP Muon RORC 3 FLP FTP RORC 3 FLP Trigger Detectors ~ 250 FLPs 10 or 40 Gb/s EPN Farm Network Data Storage Network EPN Data Storage EPN ~ 1250 EPNs ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 8

Detector Readout Combination of continuous and triggered readout Continuous readout for TPC and ITS

Detector Readout Combination of continuous and triggered readout Continuous readout for TPC and ITS • At 50 k. Hz, ~5 events in TPC during drift time of 92 µs • Continuous readout minimizes needed bandwidth • Implies event building only after partial reconstruction Fast Trigger Processor (FTP) complementing CTP • Provides clock/L 0/L 1 to triggered detectors and TPC/ITS for data tagging and test purposes ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 9

DDL/RORC Development Data Link DDL 1 (now): 2 Gbit/s DDL 2 (LS 1): 6

DDL/RORC Development Data Link DDL 1 (now): 2 Gbit/s DDL 2 (LS 1): 6 Gbit/s DDL 3 (LS 2): 10 Gbit/s 4 4 4 Receiver Card (FPGA) RORC 1 (now) - 2 DDL 1, PCI-X&PCIe Gen 1 x 4 RORC 2 (LS 1 HLT) -12 DDL 2, PCIe Gen 2 x 8 RORC 3 (LS 2) - 10 -12 DDL 3, PCIe Gen 3 ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 10

Network Requirements Total number of nodes: FLP Node Output: EPN Node Input: EPN Output:

Network Requirements Total number of nodes: FLP Node Output: EPN Node Input: EPN Output: ~1500 up to 12 Gbit/s up to 7. 2 Gbit/s up to 0. 5 Gbit/s Two technologies available - 10/100 Gbit Ethernet (currently used in DAQ) - QDR/FDR Infiniband (40/52 Gbit, used in HLT) Both would allow to construct a network satisfying the requirements even today ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 11

Network & Data Transport Different topologies under study to - minimize cost - optimize

Network & Data Transport Different topologies under study to - minimize cost - optimize failure tolerance - cabling Spine&Leaf 1 1 2 2 3 3 … … Fat Tree p Director Switch m 1 2 3 … m Efficient use requires further R&D of data transport frameworks towards high rate (DAQ/HLT) ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 12

Processing Power Estimate for online systems based on current HLT processing power - ~2500

Processing Power Estimate for online systems based on current HLT processing power - ~2500 cores distributed over 200 nodes - 108 FPGAs on H-RORCs for cluster finding - 1 FPGA equivalent to ~80 CPU cores - 64 GPGPUs for tracking (NVIDIA GTX 480 + GTX 580) Scaling to 50 k. Hz rate to estimate requirements - ~ 250. 000 cores - additional processing power by FPGAs + GPGPUs 1250 -1500 nodes in 2018 with multicores ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 13

Processing Power Estimate for offline processing power - Today: 2 month with 104 cores

Processing Power Estimate for offline processing power - Today: 2 month with 104 cores for 1 month Pb-Pb run - 1 month Pb-Pb run after upgrade: ~2 x 1010 events, two orders of magnitude more than today ð 106 cores required after upgrade Expected performance increase per node until 2018: factor 16 - Additional gain by code optimization, use of online reconstruction results and farm Offline raw storage requirement increases by factor ~10 ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 14

Parallel Reconstruction Change in computing paradigm: single core clock speed stagnating, instead multi/many-core -

Parallel Reconstruction Change in computing paradigm: single core clock speed stagnating, instead multi/many-core - several R&D projects underway, some already in production use HLT CPU/GPU tracking Parallelized transport in simulation ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 15

Summary ALICE physics program requires handling of 50 k. Hz minimum-bias Pb-Pb collisions (1

Summary ALICE physics program requires handling of 50 k. Hz minimum-bias Pb-Pb collisions (1 TByte/s) from the online and offline systems Strategy to handle the load is partial online reconstruction & discarding of raw data R&D projects on both hardware and software started - results are promising - incremental move into production - several new groups expressed interest in joining effort ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 16

Organization Expression of interest to the upgrade of the Online Systems, subject to funding

Organization Expression of interest to the upgrade of the Online Systems, subject to funding CERN Geneva European Organization for Nuclear Research Croatia Split University of Split Croatia Zagreb Rudjer Bošković Institute Croatia Zagreb University Germany BMBF Frankfurt Institute for Advanced Studies, Goethe-Universität Germany BMBF Frankfurt Institut für Informatik, Goethe-Universität Hungary Budapest KFKI Wigner Reserach Center for Physics, Inst. for Particle Nuclear Physics India Jammu Physics Department, Jammu University India Mumbai Indian Institute for Technology (ITT) Bombay Poland Warsaw University of Technology Slovakia Košice Technical University Turkey Karatay University Total cost estimate: 9. 3 MCHF ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 17

ALICE© | Title of the Meeting | Date | Speaker 18

ALICE© | Title of the Meeting | Date | Speaker 18

Backup - Processing Power Estimate of processing power based on scaling by Moore’s law

Backup - Processing Power Estimate of processing power based on scaling by Moore’s law However: no increase in single core clock speed, instead multi/many-core ðReconstruction software needs to adapt to full use resources Picture from Herb Sutte: The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software Dr. Dobb's Journal, 30(3), March 2005 (updated) ALICE | LHCC Detector Upgrade Review | 25. 09. 2012 | Thorsten Kollegger 19