Experiment Operations Simone Campana CERN IT Department CH1211
- Slides: 16
Experiment Operations Simone Campana CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it
Outline • Try to answer to the following questions: – – How are experiment operations organized? Which Communication Channels are used? Which are the commonalities? Which are the differences? Thanks to Patricia Mendez Lorenzo, Roberto Santinelli and Andrea Sciaba + many other from experiments Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it
CMS Computing Operations Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Computing Shift Person (CSP) at the CMS centre at CERN or FNAL – Monitors the computing infrastructure and services going through a checklist – Identifies problems, triggers actions and calls – Creates e. Log reports and support tickets – Reacts to unexpected events • Computing Run Coordinator (CRC) at CERN – Overview of offline computing plans and status, operational link with online, keeps track of open computing issues – Is a computing expert • Expert On Call (EOC), physically located anywhere in the world – Very expert in one or more aspects of the computing system (there can be more than one) – Must be on call
CMS Computing Operations • Data Operations expert on call: – Runs the T 0 workflows and the T 1 transfers – Monitors the above workflows • Time Coverage – During global runs: • Computing Shift Person: 8 hours shift, 16/7 coverage • Data. Ops expert: 16/7 mandatory, 24/7 voluntary – Otherwise (local runs): • CSP: 8/5 coverage • Data. Ops expert: just on call Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it
LHCb Computing Operations • Grid Shifters (a. k. a production shifters) – – Running production and data handling activities Identifying and escalating problems Some not-so-basic knowledge of Grid services and LHCb framework See tick list for more information: https: //twiki. cern. ch/twiki/pub/LHCb/Production. Operations/Grid. Shifter 1 70808. pdf • Grid Expert on call – addressing problems – defining/improving operational procedures. • Production Manager (based at CERN) – Organizes the overall production • Dirac Developers experts Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – Fraction of time dedicated to run Grid Operations • All Grid Operations are run from CERN – With the exception of some contact persons at T 1 s whose role also fits in one of the above
LHCb Time Coverage LHC down : decided to move to 1 shifter for working hours Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it For more information please check the production operations web page https: //twiki. cern. ch/twiki/bin/view/LHCb/Production. Operations
ALICE Computing Operations • ALICE Computing Operations is a joined effort between: – ALICE Core offline team running ALICE operations. • Centralized at CERN – WLCG ALICE experiment support i. e. people offering Grid expertise to ALICE • Production manager organizing the overall activity – with workflow and component experts behind • data expert, workload expert, Alien expert etc. . . Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Offline shifts in the ALICE control room (P 2) – Support the central GRID services and management tasks. • RAW data registration (T 0) and replication to T 1 s • Conditions data gathering, storage and replication • Quasi online first pass reconstruction at T 0 – and asynchronous second pass at T 1 s • ALICE Central Services status • ALICE Site Services (VO-box/WMS/storage) status
ALICE Time Coverage • Offline shifts 24/7 during data taking • First line support at CERN provided by IT/GS. Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Site support is tiered and assured by regional experts – one per country/region, in contact with site experts. – supported by the Core Offline and/or by the WLCG experts for high level or complex Grid issues. – very important to emphasize the importance of the support also at T 2 sites
ATLAS Computing Operations • ATLAS Computing Shift at P 1: 24(16)/7 during data taking – T 0 shifter • Monitor Data collection and recording from P 1 to T 0 • Monitor First processing at T 0 – Distributed Computing Shifter • Monitor T 0 -T 1 and T 1 -T 1 data distribution – Database shifter • ATLAS Distributed Computing Shifts (ADCo. S) – Several level of expertise: Trainee, Senior, Expert, Coordinator – Monitor Monte Carlo production and T 2 transfer activities • ATLAS Expert On-Call: 24/7 Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – Offers expertise for data distribution activities • Developers and single components experts: best effort – offering third level support
ADCo. S Time Coverage America: 2 experts+5 seniors+ 3 trainees Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Europe 5 experts+10 seniors+ 5 trainees Asia: 4 seniors+1 trainee Covering 24 h/day and 6 days/week, having people in three time-zones (no need for night shifts)
CMS Comunication Channels • e. Log (using DAQ e. Log + FNAL e. Log, will have dedicated CERN box) • “Computing plan of the day” (by the CRC) • AIM accounts for shifters • Savannah – + GGUS for EGEE sites Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Sites Operations: Savannah + HN • Operations Sites: Savannah, GGUS (+HN) • Users Operations: CMS user support (Savannah + email)
LHCb Communication Channels Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Internally LHCb: – Elog book: http: //lblogbook. cern. ch/Operations/ – 14 X 7 : Expert cell-phone number: 16 -1914 – Daily meeting (14: 30 – 15: ? ? ) – Mailing list: lhcb-grid@cern. ch (for ops matters) lhcb-dirac@cern. ch (for dev matters) mailing list for each contact person. • Outreaching services and sites: – GGUS and/or Remedy • ALARM tickets just for test, TEAM ticket not extensively used yet – WLCG daily and weekly meetings – IT/LHCb coordination meeting, SCM meeting – Higher level meetings (GDB/MB) – Local contact person and central grid coordinator person useful for speeding up resolution of problems • Being reached from users and sites: – Support unit defined in GGUS – Mailing lists – Contact persons acting as liaison/reference for many site admins and service providers
ALICE Communication Channels • Internal ALICE communication – Mailing list – ALICE-LCG-EGEE Task Force • Communication with users and User Support – Mailing list for operational problems and Savannah tracker for bugs. – Monthly User Forums (EVO) for dissemination of new Grid related information and analysis news. • And monthly Grid training for new users • Communication with sites and Grid operation support Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – – – TASK force Mailing List for operational problems GGUS daily WLCG ops meetings weekly ALICE-LCG taskforce meetings Dedicated contacts with many sites
ATLAS Communication Channels • Internal Communication – ADCo. S ELOG + T 0 ELOG + ADCS@P 1 ELOG – Savannah for DDM problem tracking • Communication with sites – Mainly GGUS • Team Tickets for all shifts + ALARM tickets for restricted list of experts – Support Mailing Lists • mostly for CERN (CASTOR, FTS, LFC) – Cloud Mailing Lists • Informational only – Many sites read ELOG – No clear site 2 ATLAS channel • ATLAS operations mailing list, but something better should be thought. • Communication with Users Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – Mostly HN for Operations 2 Users – GGUS + Savannah for Users 2 Operations • … and meetings: Daily WLCG Meeting, weekly ATLAS ops
Conclusions (I) • Experiment Operations rely on multilevel operation mode – First line shift crew – Second line Experts On-Call – Developers as third line support • not necessarily on-call • Experiments Operations strongly integrated with WLCG operations and Grid Service Support – Expert support – Escalation procedures Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Especially for critical issues or long standing issues • Incidents Post Mortems – Communications and Notifications • I personally like the daily 15: 00 h meeting
Conclusions (II) • ATLAS and CMS rely on a more distributed operation model – Worldwide shifts and experts on call • Central Coordination always at CERN – Possibly due to geographical distribution of partner sites • Especially for US and Asia regions • All experiments recognize the importance of experiment dedicated support at sites Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – CMS can rely on contacts at every T 1 and T 2 – ATLAS and ALICE can rely on contacts per region/cloud • Contact at all T 1 s, usually dedicated • Some dedicated contact also at some T 2 – LHCb can rely on contacts at some T 1
- Simone campana
- Simone campana
- Ch1211
- Cern beams department
- Cern te department
- Cern hr department
- Intel vp weekly compute projectian
- Cern edh
- Campaña cuidado de manos
- Campana de gauss
- Hera y juno
- Signo de la tienda de campaña endoscopia
- Diagrama
- Santo do amor
- Piramide pagoda campana bulbo
- Poesia del combate naval de iquique para niños
- La campaña definitiva 1820 a 1822