Performance and Exception Monitoring Integrated Management Tools for

  • Slides: 14
Download presentation
Performance and Exception Monitoring Integrated Management Tools for the Future

Performance and Exception Monitoring Integrated Management Tools for the Future

Overview • Motivate project – What we have – What are the problems •

Overview • Motivate project – What we have – What are the problems • How might machines be run in the future – Project aims • Available tools • The project structure 99/08/20 Tim Smith après-C 5 2

A Selection of Current Tools System exceptions stand alone (SURE) Interactive server load Accounting;

A Selection of Current Tools System exceptions stand alone (SURE) Interactive server load Accounting; process, LSF AFS, web NFS, weekly paper GUI application AFS, web CDR statistics Web Stage statistics Command on demand Remote performance Tape statistics Point-to-point network LSF monitoring 99/08/20 Tim Smith après-C 5 3

A Selection of Current Problems • AFS volumes filling causing data loss • Numerous

A Selection of Current Problems • AFS volumes filling causing data loss • Numerous independent elements – No correlations; Wasted resources rechecking • Alarm and corrective actions not linked • Maintenance – Many languages/authors/distributed source – Not complete, uniform or connected web access • Independent growth of independent systems • Scaled to a handful of machines in tens of clusters • Very hard to implement Global Metrics 99/08/20 Tim Smith après-C 5 4

Service Definitions 99/08/20 Tim Smith après-C 5 5

Service Definitions 99/08/20 Tim Smith après-C 5 5

Global Metrics • Honour Service Definitions • “Availability of usable 3000 CUs batch” –

Global Metrics • Honour Service Definitions • “Availability of usable 3000 CUs batch” – Machines up + FATMEN + LSF • “Availability of an interactive facility” – ASIS available + low trivial response time • “Job turnaround time expectations” • “Time to service tape request” + Disk/Network bandwidths + CPU/Memory utilisations 99/08/20 Tim Smith après-C 5 6

Visions of the Future (I) • 1000’s of PCs per cluster – Living with

Visions of the Future (I) • 1000’s of PCs per cluster – Living with failures + scalable solutions! • Assure a service; Quorum of machines NOTfull complement • Quality of Service measures – reflected in the monitoring – Global Metrics • High level correlations – to assess impact on a service 99/08/20 Tim Smith après-C 5 7

Visions of the Future (II) • Automated installations – Bootstrap and checklist – Like

Visions of the Future (II) • Automated installations – Bootstrap and checklist – Like CERN new arrivals! • Distributed control Software Installation procedure Monitoring Pcinst 01: /install/mon Monitoring V 2. 3 13: 00 @ 2000/04/02 – Pull new versions • Dynamic assignment to experiment • Configuration management and Monitoring intertwined 99/08/20 Tim Smith après-C 5 8

Selected Project Goals • Common daemons – low CPU utilisation – integrated corrective actions

Selected Project Goals • Common daemons – low CPU utilisation – integrated corrective actions • Common store + common format – Central Data Base • • Easy access for multiple views Averaging/Archiving procedures for histories Online access to measures currently per week Ready for future integrated management tools – Understand what we want not what we have 99/08/20 Tim Smith après-C 5 9

To be de ci de d by pr oj ec t Skeletal Illustration 99/08/20

To be de ci de d by pr oj ec t Skeletal Illustration 99/08/20 Tim Smith après-C 5 10

Build or Buy • Last time: Take over current functions? Lights out operations? –

Build or Buy • Last time: Take over current functions? Lights out operations? – Unicenter. TNG, Tivoli, HPOpenview, Patrol – Ranger (SLAC), Scout (DESY) • This time: Address future requirements? – Open DB format? – Scripting access for correlation tracking? – Install on demand? – Uniform web access 99/08/20 Tim Smith après-C 5 11

The Project (I) • Design team – Pete, Chris, Tim, Vincent D. , Vincent

The Project (I) • Design team – Pete, Chris, Tim, Vincent D. , Vincent R. , Bernd, Fernando, Alessandro, Lionel, Nilo, Eric G. , Iosif • Key consultants – Fabio, Dave, Olof, JPB, Catherine, Tony, SERCo – Other IT Groups (ASD, IA, CE) – Users! 99/08/20 Tim Smith après-C 5 12

The Project (II) 99/08/20 Tim Smith après-C 5 13

The Project (II) 99/08/20 Tim Smith après-C 5 13

Conclusions • The need for a revamp is clear • The time is ripe

Conclusions • The need for a revamp is clear • The time is ripe • Exciting and hopeful techniques/tools available • Great interest in the prospective project • Challenges of the future crystallising • Position ourselves ready 99/08/20 Tim Smith après-C 5 14