October 28 2013 at 14 th CERNKorea Committee
October 28, 2013 at 14 th CERN-Korea Committee, Geneva Status of Tier-1 @ GSDC, KISTI Gungwon Kang, Hangjin Jang & Sang-Un Ahn, for the GSDC Tier-1 Team sahn@kisti. re. kr
KISTI GSDC Tier-1 Team ROLE Name Representative Haeng-Jin Jang System Management Hee-Jun Yoon System Administration Storage (Disk & Tape) Network Site Operation & Administration KIAF Operation & User Support Seung-Hee Lee Jeong-Heon Kim Hee-Jun Yoon Sang-Oh Park Hyoung-Woo Park KISTI support (Dr. Bu-Seung Cho) Il-Yeon Yeo Sang-Un Ahn ~ 9 people 2 14 th CERN-Korea Committee 28 October 2013
Updates – 1/3 VOBOX upgrade (May 2013) � � � From g. Lite-VOBOX to WLCG-VOBOX (EMI) Latest version of Ali. En & SL 6 (x 64 architecture) supported ALICE packages deployed � Scientific Linux 6 Kernel Security Patch (May 2013) � GLExec deployment (July 2013) � � � New worker node probe supporting MUPJ(Multi-User Pilot Jobs) deployed: GLExec ARGUS server was configured at CREAM-CE at the same time MAUI-Torque scheduling optimization for new pilot jobs SHA-2 support (July 2013) � � � In order to prepare for new certificate using SHA-2 (256 bits) issued & in production by the end of this year CREAM-CE needed to be update (update #10, v 1. 14. 4) Cern. VM-FS deployment (July 2013) � � 3 ALICE decided to migrate from torrent-type package distribution to using CVMFS Squid proxy servers are required to be setup for cache 2 proxies were installed for high availability feature 14 th CERN-Korea Committee 28 October 2013
Updates – 2/3 � � RAW replication: � Data transfer of p-Pb & p-p collision data taken in 2013 was done on 16 th August � Total size of 310 TB with 400 k files � Average transfer rate: 22 MB/s (from 7 th March to 16 th August; peak ~400 MB/s) 10 Gbps link upgrade: � 10 Gbps link upgrade plan presented to WLCG MB on 17 th September � By the end of this year, 2 Gbps link will be in production � Initiative for 10 Gbps configuration will start in the next fiscal year (starting 1 st March usually) � Becoming full Tier 1 to be discussed in the coming spring (probably at the next WLCG OB in March 2014) ……To be explained in more detail later 4 14 th CERN-Korea Committee 28 October 2013
Computing Resource Status � 2013 Pledges (CPU): Hep. Spec 06 25, 000 � Current Hep. Sepc 06: 15, 840 (# of slots: 1800 including 4 reserved pilots slots) � 1, 800 Jobs slots available (4 reserved slots for pilot jobs) with H/T enabled � New servers has been delivered in the last week of this CKC � Physical 2, 000 cores will be allocated to Tier 1 for meeting pledges Without Hyper-Threading (per core) With Hyper-Threading (per core) Hep. Spec 06 Score 14. 3 8. 8 # of cores required to meet pledges 1750 2860 ※ Benchmark Environment: Intel Xeon X 5650 @ 2. 67 GHz; 6 cores * 2 CPUs; Scientific Linux 6 (x 64); gcc 4. 4. 6, g++ 4. 4. 6, gfortran 4. 4. 6 � � 5 2013 Pledges (Tape Storage): Tape 1, 500 TB � Current Tape capacity: 1, 000 TB � Pledges will be met in November 2013 Pledges (Disk Storage): Disk 1, 000 TB � Current Disk capacity: 966 TB (allocated 1, 000 TB but usable space slightly below) � One additional XROOTD server will be added 14 th CERN-Korea Committee 28 October 2013
Operation Status � Concurrent job capacity: 1796 (for ALICE only, 4 reserved for pilots) � After RAW replication done, reconstruction jobs are the majority � No critical mal-functioning of system, but few interventions � ~1, 400 � VOBOX upgrade, SL 6 security update � Trivial missing library issues after kernel compilation � Short network interventions: scheduled & unscheduled Mostly transparent: GLExec, CVMFS deployment & CREAM-CE update VOBOX upgrade, SL 6 Security update, trivial libs issues ~1, 800 Scheduled downtime ~ 19 th October (Un-)Scheduled network intervention Jul Jan 6 14 th CERN-Korea Committee Oct 28 October 2013
2013 p-Pb RAW replication � p-Pb data transfer started on 7 th March and done on 16 th October � Total data size: 309. 9 TB (400 k files), 177 transfer runs � No critical incident during the replication � Transfer speed: 22. 74 MB/s on average, 402. 5 MB/s on peak � Below than expected (~60 MB/s) due to shared 1 Gbps link with other Tier 1 services Heavy traffic on WNs Incident on firewall SL 6 Security Update 7 th Mar 2013 16 th Oct 2013 ~35 MB/s ~20 MB/s � 7 First reconstruction jobs started in 11 June and KISTI-GSDC showed a good performance (~ 50% contribution of 3 RAW reconstruction cycles) KISTI-Asian ALICE Tier-2 Seminar 5 August 2013
Site Availability/Reliability � 8 14 th CERN-Korea Committee 28 October 2013
Network – 10 G Network to Join OPN � Establishing 2 Gbps link between CERN-KISTI is on the administrative process � � Contract with NLR(US) and SURFnet(NL) will be started soon Budget to upgrade network up to 10 Gbps is secured and will be performed in Mar. 2014 � Including 1 Gbps backup link � Joining OPN has discussed with CERN Network experts (in early Oct) � Timescale for 10 Gbps upgrade requires more than 6 months (connection/test is foreseen in Aug. 2014) � Plan revised (Sept. 17 on WLCG MB): 9 Year 2012 2013 2014 2015 Bandwidth (proposed) 1 Gbps 2 Gbps (Oct. 2013) 2 Gbps 10 Gbps (Aug. 2014) 10 Gbps (1 Gbps) (2 Gbps) (3 Gbps) - WLCG MB 2013 -09 -17
Conclusion � Full sets of p-Pb collision data has been replicated to KISTI tape storage � � Heavy activities to reconstruct them are on-going By the end of this year, dedicated 2 Gbps bandwidth will be established between CERN and KISTI � Joining OPN should start when 2 Gbps link established � 10 Gbps network upgrade plan submitted to WLCG MB � Possibly could be discussed becoming full Tier-1 in Mar. 2014 depending on the network status 10 14 th CERN-Korea Committee 28 October 2013
Milestones Target Objective Nominate KISTI/GSDC representatives in the WLCG Management Board and the GDB Jun. 2012 Establishment of a 1 Gbps connectivity to CERN Apr. 2012 Installation of tape system Dec. 2012 High speed transfer of data from CERN to KISTI at the speed required to receive and archive 10% of the ALICE AA raw data foreseen for 2012 over a continuous period of 2 weeks Apr. 2013 Provide a precise plan for 3 Gbps (or higher) connectivity to CERN Sep. 2013 Present a plan for providing on-call services/support according to the T 1 specifications as laid out in the WLCG Mo. U Sep. 2013 85% of the job capacity running for at least 2 months Apr. 2013 90% Storage Element (DPM and/or XROOTD) availability (functional tests) for at least 2 months Apr. 2013 Running of the reliability tests (both OPS and ALICE-specific) and publishing those to the new SAM infrastructure Feb. 2013 Integration with the APEL accounting system and publishing accounting data Jan. 2013 90% of the WLCG T 1 service targets for at least 2 months - Integration in the WLCG OPN (with 2 Gbps) Discussed Functional tests of the OPN (with 2 Gbps) Discussed Issue � Slow performance than expected: perf. SONAR test scheduled when 2 Gbps link established To be done � OPN Integration process has discussed and timeline should be fixed after 2 Gbps link established � OPN Functional tests to be followed T 1 service target � 90% of Availability (98. 5 % of Reliability) from April to September � ALICE confirmed that KISTI Tier-1 has shown to be a reliable site 11 WLCG MB ■ Done ■ To be done 2013 -09 -17 ■ Issue
- Slides: 11