Panda Grid Status Kilian Schwarz GSI on behalf
Panda Grid Status Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz)
Central services, LDAP, DB and ML transfers Phone meeting on 1 st Feb 2012 Till end of February GRID management center has to be moved out of Glasgow, including: Lightweight Directory Access Protocol (LDAP) My. SQL Data. Bases (DB) -> GSI, Torino Alien 2 Central Services (CS) -> GSI PANDA GRID Mona. Lisa (ML) -> Jülich
Panda GRID @ GSI Central Services installation status after the May Panda GRID meeting: Lightweight Directory Access Protocol (LDAP) -> GSI My. SQL Data. Bases (DB) -> GSI, Torino Ali. En 2 Central Services (CS) -> GSI PANDA GRID Mona. Lisa (ML) -> Jülich / Torino Recent changes of Ali. En required direct interventions of the CERN people to our My. SQL and our machine settings - still working to bring the Panda GRID back
Panda GRID Map ~12 sites ~1400 CPUs SC, LDAP, DB in GSI
Jobs share +------+----+ | status | jobs | +------+----+ | DONE | 204271 | | DONE_WARN | 4833 | | ERROR_E | 11026 | | ERROR_IB | 1931 | | ERROR_RE | 14766 | | ERROR_SV | 14273 | | ERROR_V | 59 | | EXPIRED | 6338 | | INTERRUPTE | 31 | | OVER_WAITI | 1408 | | SAVED | 338 | +------+----+ +---------------------+-------+-------+---------+------+-------+ | site | jobs | DONE | ERROR | WAIT | STARTED | RUNNING | SAVE | ZOMBIE | OTHER | +---------------------+-------+-------+---------+------+-------+ | | 1573 | 0 | 0 | 0 | 165 | 1408 | | PANDA: : Bucharest: : panda 01 | 31141 | 25978 | 4892 | 0 | 0 | 271 | 0 | | PANDA: : Dubna: : pbs | 9570 | 8212 | 251 | 0 | 0 | 69 | 1038 | 0 | | PANDA: : GSI: : lxgrid 8 | 88322 | 74471 | 12005 | 0 | 0 | 1815 | 31 | | PANDA: : Juelich: : ce 642 | 1382 | 1201 | 169 | 0 | 0 | 12 | 0 | | PANDA: : KVI: : PBS | 36445 | 32052 | 3784 | 0 | 0 | 242 | 367 | 0 | | PANDA: : Mainz: : himster | 64449 | 47635 | 14444 | 0 | 0 | 2370 | | PANDA: : Torino: : CREAM | 9414 | 8502 | 758 | 0 | 0 | 154 | 0 | | PANDA: : Torino: : PBS | 3963 | 2686 | 1276 | 0 | 0 | 1 | 0 | | PANDA: : Vienna: : smigrid 02 | 9123 | 8367 | 584 | 0 | 0 | 27 | 145 | 0 | +---------------------+-------+-------+---------+------+-------+ TOTAL NUMBER OF JOBS IN THE LAST 6 MONTH: +----+ | 259274 | +----+ Because of the database changes the information about old jobs is accessible only from the My. SQL, and is not available from Monalisa. Also, the job counter started from 0 again.
Installed: panda_extern: apr 08, jul 09, may 11, jan 12 pandaroot: may 11, july 11, august 11 nov 11, stable, trunk (updated every Tuesday with results published in pandaroot cdash) Panda. Root @ GRID
GRID Disk Usage
needed more GRID users and we have to regain the users trust after a longer period of only partial functionality http: //panda-wiki. gsi. de/cgi-bin/view/Computing/Panda. Grid. Ali. En 2 Client. Install more sites http: //panda-wiki. gsi. de/cgi-bin/view/Computing/Panda. Grid. Ali. En 2 Site. Install GRID developers
ALICE & PANDA The PANDA-ALICE relationship: we use middleware written by ALICE we have our own requirements and requests we are supposed to give back: allocate dedicated manpower for middleware development and user support manpower will come also via LSDMA develop in-house expertise with this middleware, and not only as users debug and develop Ali. En: Oracle Interface, Slurm Interface, Po. D interface, VOVO interface PANDA uses already Ali. En v 2 -20 and is debugging this for ALICE
Issues masterjob –printsite does not work fquota does not work properly for many users “services” command not working packman install –everywhere does not work job triggered installation is not sufficient for PANDA since we compile on site Ali. En installer installation works only with manual fixes (Gnu. so. . . ) master. SE replicate
Issues #2 some sites still do not take jobs Deletion of files inter site data transfer/mirror ROOT API packages list in ML activation of backup DB
wish list • JAli. En • To be able to install specific revision number via Ali. En installer
conclusion • ALICE/FAIR collaboration also in context of Grid computing works quite well • Still there is room for improvement • PANDA can not be beta tester within its production environment • common testbed maintained by ALICE and PANDA ? • information flow needs to be improved. We can not always be taken by surprise if there is some majore change in the Ali. En DB • how to solve all the existing issues ? Currently we put them all in the GSI ticketing system. Who is responsible for what ?
- Slides: 14