The D 0 NIKHEF Farm Ton Damen Willem

  • Slides: 33
Download presentation
The D 0 NIKHEF Farm Ton Damen Willem van Leeuwen Kors Bos Fermilab, May

The D 0 NIKHEF Farm Ton Damen Willem van Leeuwen Kors Bos Fermilab, May 23 2001

Layout of this talk D 0 Monte Carlo needs The NIKHEF D 0 farm

Layout of this talk D 0 Monte Carlo needs The NIKHEF D 0 farm The data we produce The SAM data base A Grid intermezzo The network The next steps Fermilab, May 23 2001

D 0 Monte Carlo needs • D 0 Trigger rate is 100 Hz, 107

D 0 Monte Carlo needs • D 0 Trigger rate is 100 Hz, 107 seconds/yr 109 events/yr • We want 10% of that be simulated 108 events/yr • To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte) – On a 800 MHz PIII • So 1 cpu can produce ~105 events/yr (~200 Gbyte) – • • Assuming a 60% overall efficiency So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte) – And this is only 10% of the goal we set ourselves – Not counting Nijmegen D 0 farm yet So we need another 900 cpu’s – UTA (50), Lyon (200), Prague(10), BU(64), – Nijmegen(50), Lancaster(200), Rio(25), 3

How it looks

How it looks

The NIKHEF D 0 Farm Sam station Tape robot @SARA network Meta data 155

The NIKHEF D 0 Farm Sam station Tape robot @SARA network Meta data 155 Mbit/s @Fermilab 1 Gbit/s Surfnet NIKHEF network. . Etc. 100 Mbit/s 1 Gbit/s . . Etc. Farm Server Sam station 1 Gbit/s 100 Mbit/s . . Etc. switch File Server . . Etc. 1. 5 TB disk cache Farm nodes

50 Farm nodes (100 cpu’s) Dell Precision Workstation 220 • Dual Pentium III processor

50 Farm nodes (100 cpu’s) Dell Precision Workstation 220 • Dual Pentium III processor 800 MHz / 256 k. B cache each • 512 MB PC 800 ECC RDRAM • 40 GB (7200 rpm) ATA-66 disk drive • no screen • no keyboard • no mouse • wake up on Lan functionality 6

The File Server Elonex EIDE Server The Farm Server Dell Precision 620 workstation •

The File Server Elonex EIDE Server The Farm Server Dell Precision 620 workstation • Dual Pentium III 700 MHz • Dual Pentium III Xeon 1 GHz • 512 MB SDRAM • 512 MB RDRAM • 20 GByte EIDE disk • 72. 8 GByte SCSI disk • 1. 2 Tbyte : 75 GB EIDE disks • 2 x Gigabit Netgear GA 620 network card • Will also serve as D 0 software server for the NIKHEF/D 0 people 7

Software on the farm • • • Boot via the network Standard Redhat Linux

Software on the farm • • • Boot via the network Standard Redhat Linux 6. 2 Ups/upd on the server D 0 software on the server FBSNG on the server, deamon on the nodes SAM on the file server • Used to test new machines … 8

What we run on the farm • Particle Generator: Pythia or Isajet • Geant

What we run on the farm • Particle Generator: Pythia or Isajet • Geant Detector simulation: d 0 gstar • Digitization, adding min. bias: psim • Check the data: mc_analyze • Reconstruction: preco • Analysis: reco_analyze 9

Example: Min. bias • Did a run with 1000 events on all cpu’s –

Example: Min. bias • Did a run with 1000 events on all cpu’s – Took ~2 min. /event – So ~1. 5 days for the whole run – Ouput file size ~575 MByte • We left those files on the nodes • reason for enough local disk space • Intend to repeat that “sometimes” 10

Output data • • • -rw-r--r-- 1 a 03 computer 298 Nov 5 19:

Output data • • • -rw-r--r-- 1 a 03 computer 298 Nov 5 19: 25 Run. Job_farm_qcd. Job 308161443. params -rw-r--r-- 1 a 03 computer 1583995325 Nov 5 10: 35 d 0 g_mcp 03_pmc 03. 00. 01_nikhef. d 0 farm_isajet_qcd-incl. Pt. Gt 2. 0_mb-none_p 1. 1_308161443_2000 -rw-r--r-- 1 a 03 computer 791 Nov 5 19: 25 d 0 gstar_qcd. Job 308161443. params -rw-r--r-- 1 a 03 computer 809 Nov 5 19: 25 d 0 sim_qcd. Job 308161443. params -rw-r--r-- 1 a 03 computer 47505408 Nov 3 16: 15 gen_mcp 03_pmc 03. 00. 01_nikhef. d 0 farm_isajet_qcd-incl. Pt. Gt 2. 0_mb-none_p 1. 1_308161443_2000 -rw-r--r-- 1 a 03 computer 1003 Nov 5 19: 25 import_d 0 g_qcd. Job 308161443. py -rw-r--r-- 1 a 03 computer 912 Nov 5 19: 25 import_gen_qcd. Job 308161443. py -rw-r--r-- 1 a 03 computer 1054 Nov 5 19: 26 import_sim_qcd. Job 308161443. py -rw-r--r-- 1 a 03 computer 752 Nov 5 19: 25 isajet_qcd. Job 308161443. params -rw-r--r-- 1 a 03 computer 636 Nov 5 19: 25 samglobal_qcd. Job 308161443. params -rw-r--r-- 1 a 03 computer 777098777 Nov 5 19: 24 sim_mcp 03_psim 01. 02. 00_nikhef. d 0 farm_isajet_qcd-incl. Pt. Gt 2. 0_mb-poisson-2. 5_p 1. 1_308161443_2000 -rw-r--r-- 1 a 03 computer 2132 Nov 5 19: 26 summary. conf 11

Output data translated 0. 047 Gbyte gen_* 1. 5 Gbyte d 0 g_* 0.

Output data translated 0. 047 Gbyte gen_* 1. 5 Gbyte d 0 g_* 0. 7 Gbyte sim_* import_gen_*. py import_d 0 g_*. py import_sim_*. py isajet_*. params Run. Job_Farm_*. params d 0 gstar_*. params d 0 sim_*. params samglobal_*. params Summary. conf 12 files for generator+d 0 gstar+psim But of course only 3 big ones Total ~2 Gbyte 12

Data management Fermilab d 0 mino sam SARA TERAS reconstructed data parameters Import_gen. py

Data management Fermilab d 0 mino sam SARA TERAS reconstructed data parameters Import_gen. py Import_d 0 g. py Import_sim. py Import_reco. py generator data geant data (hits) sim data (digis) NIKHEF D 0 FARM 13

Automation • Mc_runjob (modified) – Prepares MC jobs (gen+sim+reco+anal) • (f. e. ) 300

Automation • Mc_runjob (modified) – Prepares MC jobs (gen+sim+reco+anal) • (f. e. ) 300 events per job/cpu • Repeat (f. e. ) 500 times – Submits them into the batch (FBS) • Ran on the nodes – Copy to fileserver after completion • A separate batch job onto the fileserver – Submits them into SAM • Sam does file transfers to Fermi and SARA • Runs for a week … 14

1. 2 TB mcc request fbs(rcp) farm server file server fbs(sam) fbs job: 1

1. 2 TB mcc request fbs(rcp) farm server file server fbs(sam) fbs job: 1 mcc 2 rcp 3 sam fbs(mcc) datastore mcc input FNAL SARA mcc output node 50 + SAM DB control 40 GB data metadata

This is a grid! Fermilab SARA sam d 0 mino TERAS KUN D 0

This is a grid! Fermilab SARA sam d 0 mino TERAS KUN D 0 FARM in 2 p 3 D 0 FARM NIKHEF D 0 FARM 16

The Grid • • • Not just D 0, but for the LHC expts.

The Grid • • • Not just D 0, but for the LHC expts. Not just SAM, but for any database Not just farms, but any cpu resource Not just SARA, but any mass storage Not just FBS, but any batch system Not just HEP, but any science, EO, … 17

European Datagrid Project • • 3 yr. Project for 10 M€ Manpower to develop

European Datagrid Project • • 3 yr. Project for 10 M€ Manpower to develop grid tools Cern, in 2 p 3, infn, pparc, esa, fom Nikhef + sara + knmi – – – Farm management Mass storage management Network management Testbed HEP & EO applications 18

LHC - Regional Centres KEK CERN – Tier 0 INFN BNL Tier 1 IN

LHC - Regional Centres KEK CERN – Tier 0 INFN BNL Tier 1 IN 2 P 3 NIKHEF/ SARA FNAL Tier 2 Utrecht Vrije Univ. SURFnet Department Desktop Atlas RAL Nijmegen Amsterdam LHCb Brussel Alice Leuven possibly 19

Data. Grid : Test bed sites Dubna Edinburgh Manchester Lund Estec KNMI Berlin RALIPSL

Data. Grid : Test bed sites Dubna Edinburgh Manchester Lund Estec KNMI Berlin RALIPSL Nikhef Prague Paris Brn CERN o Lyo Santander Milano n Grenoble PD-LNL Torino Madrid Marseille Pisa BO-CNAF Lisboa Barcelona ESRIN Roma Valencia Moscow Oxford. QMW Bristol Catania HEP sites ESA sites 20

The NL-Datagrid Project 21

The NL-Datagrid Project 21

NL-Datagrid Goals • National test bed for middleware development – WP 4, WP 5,

NL-Datagrid Goals • National test bed for middleware development – WP 4, WP 5, WP 6, WP 7, WP 8, WP 9 • To become an LHC Tier-1 center – ATLAS, LHCb, Alice • To use it for the existing program – D 0, Antares • To use it for other sciences – EO, Astronomy, Biology • for tests with other (Trans Atlantic) grids – D 0 – PPDG, Gri. Phy. N 22

NL-Datagrid Testbed Sites Univ. Amsterdam (Atlas) Vrije Univ. (LHCb) CERN RAL FNAL ESA Nijmegen

NL-Datagrid Testbed Sites Univ. Amsterdam (Atlas) Vrije Univ. (LHCb) CERN RAL FNAL ESA Nijmegen Univ. (Atlas) Univ. Utrecht (Alice) 23

Dutch Grid topology CERN Geneva Alice KNMI Utrecht Univ. SARA Surfnet ESA D-PAF Munchen

Dutch Grid topology CERN Geneva Alice KNMI Utrecht Univ. SARA Surfnet ESA D-PAF Munchen FNAL Free Univ. NIKHEF Nijmegen Univ. LHCb D 0 Atlas LHCb Alice 24

End of the Grid intermezzo Back to The NIKHEF D 0 farm and Fermilab:

End of the Grid intermezzo Back to The NIKHEF D 0 farm and Fermilab: The network

Network bandwidth • • • NIKHEF SURFnet 1 Gbit SURFnet: Amsterdam Chicago 622 Mbit

Network bandwidth • • • NIKHEF SURFnet 1 Gbit SURFnet: Amsterdam Chicago 622 Mbit Esnet: Chicago Fermilab 155 Mbit ATM But ftp gives us ~4 Mbit/sec bbftp gives us ~25 Mbit/sec bbftp processes in parallel ~45 Mbit/sec For 2002 • NIKHEF SURFnet • SURFnet: Amsterdam Chicago • Chicago Fermilab 2. 5 Gbit 622 Mbit 2. 5 Bbit optical ? but more. . 26

ftp++ • • • ftp gives you 4 Mb/s to Fermilab bbftp: increased buffer,

ftp++ • • • ftp gives you 4 Mb/s to Fermilab bbftp: increased buffer, # streams gsiftp: with security layer, increased buffer, . . grid_ftp: increased buffer, # streams, #sockets, fail-over protection, security bbftp ~20 Mb/s grid_ftp ~25 Mb/s Multiple ftp in // factor 2 seen Should get to > 100 Mbit/sec Or ~1 Gbyte/minute 27

SURFnet 5 access capacity 100 Gbit/s SURFnet 5 SURFnet 4 10 Gbit/s 20 Gbit/s

SURFnet 5 access capacity 100 Gbit/s SURFnet 5 SURFnet 4 10 Gbit/s 20 Gbit/s Access capacity 10 Gbit/s 2, 5 Gbit/s 1. 0 Gbit/s 155 Mbit/s 100 Mbit/s 1999 2000 2001 2002 28

TA access capacity New. York Abilene UK Super. JANET 4 It GARR-B STAR-LIGHT ESNET

TA access capacity New. York Abilene UK Super. JANET 4 It GARR-B STAR-LIGHT ESNET Geneva GEANT 2. 5 Gb NL SURFnet Fr Renater 622 Mb MREN STAR-TAP

Network load last week • Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day)

Network load last week • Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day) • Available to Chicago: 622 Mbit/s • Available to FNAL: 155 Mbit/s • Needed next year (double cap. ): ~25 Mbit/s • Available to Chicago: 2. 5 Gbit/s: factor 100 more !! • Available to FNAL: ? ?

New nodes for D 0 • In a 2 u 19” mounting • •

New nodes for D 0 • In a 2 u 19” mounting • • Dual 1 GHz PIII 1 Gbyte RAM 40 Gbyte disk 100 Mbit ethernet • Cost ~k$2 • Dell machines were ~k$4 (tax incl) FACTOR 2 cheaper!! • assembly time 1/hour • 1 switch k$2. 5 (24 ports) • 1 rack k$2 (46 u high) • Requested for 2001: k$60 • 22 dual cpu’s • 1 switch • 1 19” rack

The End Kors Bos Fermilab, May 23 2001

The End Kors Bos Fermilab, May 23 2001