IBM Software Group ITM Monitoring Resources Using Remote
® IBM Software Group ITM – Monitoring Resources Using Remote Agentless Technology Scott Wallace October 20, 2009 © 2009 IBM Corporation
IBM Software Group | Tivoli software Agenda § Overview § Planning § Installation / Configuration § Usage Tips § Troubleshooting § Wrap Up © 2009 IBM Corporation 2
IBM Software Group | Tivoli software Overview § Agentless monitoring allows you to oversee the IT environment from a set of remote servers. § Some agents already had remote capabilities 4 VMware VI and SAP § Introduced initially as an offering on OPAL § Agentless Operating System monitoring added as a product in ITM 6. 2. 1 4 Provides operating system monitoring for AIX, HPUX, Linux, Solaris and Windows 4 Monitors using multiple mechanisms: CIM, SNMP, WMI © 2009 IBM Corporation 3
IBM Software Group | Tivoli software Agentless Monitoring Architecture © 2009 IBM Corporation 4
IBM Software Group | Tivoli software Agent or Agentless § Agent-based technology resides directly on a managed server and collects data based on policy set locally or by the management server § Agentless technology resides primarily on a management server and gets its data via a remote application programming interface (API) 4 “Agentless” doesn’t mean nothing is present or running. Some basic operating system function or base application function is running to provide the information as requested over the network 4 Resources are still being used and services need to be running on the server 4 Examples include SNMP, CIM, WMI © 2009 IBM Corporation 5
IBM Software Group | Tivoli software Agentless OS Monitoring Metric Overview § Key Operating System Metrics Returned 4 Logical and Physical Disk Utilization 4 Network Utilization 4 Vertical and Physical Memory 4 System Level Information 4 Aggregate Processor Utilization 4 Process Availability § Default Situations for 4 Disk Utilization 4 Memory Utilization 4 CPU Utilization 4 Network Utilization © 2009 IBM Corporation 6
IBM Software Group | Tivoli software Remote Node Capabilities § One agent can represent more than one monitored entity 4 Multiple remote systems in one agent 4 Each remote node has a unique “Managed System Name” so they can be in different managed system lists for situations § One agent can represent different types of entities 4 Windows and Solaris agents can monitor different sets of data on different systems § Multiple instances of an agent may coreside on the same agent server © 2009 IBM Corporation 7
IBM Software Group | Tivoli software Planning © 2009 IBM Corporation 8
IBM Software Group | Tivoli software Agentless Monitoring for Operating Systems KR 2 Agentless Monitoring for Windows Operating Systems KR 3 Agentless Monitoring for AIX Operating Systems KR 4 Agentless Monitoring for Linux Operating Systems KR 5 Agentless Monitoring for HP-UX Operating Systems KR 6 Agentless Monitoring for Solaris Operating Systems http: //publib. boulder. ibm. com/infocenter/tivihelp/v 15 r 1/topic/com. ibm. itm. doc_6. 2. 1/welcome. htm © 2009 IBM Corporation 9
IBM Software Group | Tivoli software Agentless Monitoring Data Collection and Platforms § Agentless monitors can run from most ITM supported platforms 4 Windows (x 86 & x 64, not IA 64) 4 x/p/z Linux 4 Solaris 4 AIX 4 HP-UX § Agentless monitors may remotely monitor older versions of listed operating systems and other Linux distributions, depending on capabilities § If you want to use the Windows API data collectors, the Agentless monitor must run on a Windows platform You may configure different data providers for the Agentless monitors: Agentless Monitoring for AIX OS 4 SNMP v 1, v 2 c, v 3 Agentless Monitoring for HP-UX OS 4 SNMP v 1, v 2 c, v 3 Agentless Monitoring for Linux OS 4 SNMP v 1, v 2 c, v 3 Agentless Monitoring for Solaris OS 4 CIM-XML 4 SNMP v 1, v 2 c, v 3 Agentless Monitoring for Windows OS 4 Windows APIs - Windows Management Instrumentation (WMI) - Performance Monitor (Perfmon) - Event Log 4 SNMP v 1, v 2 c, v 3 © 2009 IBM Corporation 10
IBM Software Group | Tivoli software Deployment Considerations § With Agentless Monitoring, a percentage of preparation time needs to be devoted to verifying the native data emitter configurations 4 Ensuring SNMP daemons are installed, configured and started (community strings and user/pw information verified) 4 Exposing MIB branches in SNMP configuration files 4 Verifying Windows passwords and user account rights for Windows API collection 4 Patch levels for endpoint systems – need to be verified based on the User’s Guides 4 If possible, use tools like snmpwalk, WMIExplorer, and perfmon to verify the metrics are exposed before pointing ITM to the environments § Decide how many remote systems that need to be monitored and then identify the systems to run the agentless agents © 2009 IBM Corporation 11
IBM Software Group | Tivoli software Comparing Agent and Agentless Technologies Service Provider 4 No ability to put agents in a customer’s environment Speed of Implementation Agentless Unsuitable Suitable Varies High Time consuming depending on the environment Fewer points to deploy High Low Greater access Dependent on standards Agent maintenance 4 Distribution of updates Impact to Testing 4“Locked down” server environments Command control capabilities 4 Take Actions easily Granularity and coverage of monitoring metrics Data Availability 4 Real time responsiveness Security Polling Lag High Network delay Secure communication Standards dependent © 2009 IBM Corporation 12
IBM Software Group | Tivoli software Installation / Configuration © 2009 IBM Corporation 13
IBM Software Group | Tivoli software Install § Ensure that the prerequisites are met for the system that you are using for the agent. 4 See the Agentless Agent User Guides for this information © 2009 IBM Corporation 14
IBM Software Group | Tivoli software Windows Installer § Select the remote system types that you want to monitor from this Windows host. § Next select the agents you want to install in the depot for remote deploy. © 2009 IBM Corporation 15
IBM Software Group | Tivoli software Linux Installer § Select the operating system type or take the default § Select the remote system types that you want to monitor from this Linux host © 2009 IBM Corporation 16
IBM Software Group | Tivoli software Install Application Support § TEMS 4 HUB 4 Remotes § TEPS § TEP Desktop clients § Warehouse Proxy Agent § Warehouse Summarization Agent © 2009 IBM Corporation 17
IBM Software Group | Tivoli software Configuring the Agent – Linux § SNMP – v 3 § [root@rc 2 test 4 /]# itmcmd config -A r 4 § Agent configuration started. . . § Enter instance name (default is: ): SLESv 3 § Edit "Monitoring Agent for Agentless Linux OS" settings? [ 1=Yes, 2=No ] (default is: 1): § Edit 'SNMP connection' settings? [ 1=Yes, 2=No ] (default is: 1): § Port Number (default is: 161): § SNMP Version [ 1=SNMP Version 1, 2=SNMP Version 2 c, 3=SNMP Version 3 ] (default is: 1): 3 § Edit 'SNMP Version 3' settings? [ 1=Yes, 2=No ] (default is: 1): § Security Level [ 1=no. Auth. No. Priv, 2=auth. No. Priv, 3=auth. Priv ] (default is: ): 2 § User Name (default is: ): snmpuser § Auth Protocol [ 1=MD 5, 2=SHA ] (default is: ): 1 § Enter Auth Password (default is: ): § Re-type : Auth Password (default is: ): © 2009 IBM Corporation 18
IBM Software Group | Tivoli software Configuring the Agent – Linux § Priv Protocol [ 1=DES, 2=CBC DES ] (default is: ): 1 § Enter Priv Password (default is: ): § Re-type : Priv Password (default is: ): § Edit 'Remote System Details' settings? [ 1=Yes, 2=No ] (default is: 1): 1 § No 'Remote System Details' settings available? Easy to overlook § Edit 'Remote System Details' settings, [1=Add, 2=Edit, 3=Del, 4=Next, 5=Exit] (default is: 4): 1 § Managed System Name (default is: ): rc 2 SLES § SNMP host (default is: ): 172. 17. 4. 219 § 'Remote System Details' settings: Managed System Name=rc 2 SLES § Edit 'Remote System Details' settings, [1=Add, 2=Edit, 3=Del, 4=Next, 5=Exit] (default is: 4): 5 © 2009 IBM Corporation 19
IBM Software Group | Tivoli software Configuring the Agent – Windows § Open the Manage Tivoli Enterprise Monitoring Services (MTEMS) § Select the template for the agent type § Fill in the requested information © 2009 IBM Corporation 20
IBM Software Group | Tivoli software Configuring the Agent – Windows © 2009 IBM Corporation 21
IBM Software Group | Tivoli software Tips for Using © 2009 IBM Corporation 22
IBM Software Group | Tivoli software Considerations for Using § Agentless monitors return the last background collection interval of data when a real-time query results in a timeout with the endpoint system due to network load or latency § With Historical Collection enabled, the collection for all the remote endpoints will be stored on the Agentless Monitoring Server when storage “at the Agent” is selected. 4 Ensure the physical system has sufficient disk space, network bandwidth to the Warehouse Proxy Agent when monitoring large numbers of remote systems § With the Agentless Monitoring Server now maintaining connections to hundreds of severs, it becomes a more critical component in the infrastructure than a single agent instance © 2009 IBM Corporation 23
IBM Software Group | Tivoli software Agentless Health § Each remote monitor has self-monitoring attribute tables that can be used to monitor the collection process: § Performance Object Status attributes: 4 4 4 4 Last collection errors encountered Last collection start/finish times Last/average collection duration Refresh interval Number of collections Cache hit/miss/hit percent Intervals skipped (most useful) § Thread Pool attributes: 4 4 4 Current/max Thread pool size Current/average/min/max active threads Current/min/max queue length Average wait time Total jobs § Situations may be created against these attribute groups to notify of collection failures § It is recommended that an Operating System agent be co-deployed to the Agentless Monitoring Server to watch CPU, Memory, and Network utilization of the monitors © 2009 IBM Corporation 24
IBM Software Group | Tivoli software Performance Tuning Environment Variables Variable Name Default Value Description CDP_DP_CACHE_TTL 60 Time in seconds before a query will trigger a new data collection – basically the polling interval. CDP_DP_THREAD_POOL_SIZE 60 The number of threads created to perform background data collections. The Thread Pool is shared among all attribute groups in all remote nodes in an agent. Rec: that this be set to the # of managed subnodes CDP_DP_REFRESH_INTERVAL 60 The interval in seconds at which each attribute group cache is updated in the background. Rec: Set to the same # as the polling rate (CDP_DP_CACHE_TTL) CDP_DP_IMPATIENT_COLLECTOR_TIMEOUT 2 The number of seconds to wait for a data collection to happen before timing out and returning cached data. CDP_SNMP_RESPONSE_TIMEOUT 2 The number of seconds to wait for each request to time out. Each row in an attribute group is a separate request CDP_SNMP_MAX_RETRIES 2 The number of times to retry sending the SNMP request after a response timeout CDP_NT_EVENT_LOG_GET_ALL_ENTRIES_FIRS T_TIME NO Configures whether or not the Windows Event Log data provider should report old log entries on startup, or only new ones CDP_NT_EVENT_LOG_CACHE_TIMEOUT 3600 Cache lifetime in seconds of an event from the Windows Event Log CDP_PURE_EVENT_CACHE_SIZE 100 Number of pure events held in cache at any one time. When a query is made, reports all events in the cache at that time. When cache is full, oldest events are removed to make room for new ones © 2009 IBM Corporation 25
IBM Software Group | Tivoli software Troubleshooting © 2009 IBM Corporation 26
IBM Software Group | Tivoli software Troubleshooting Overview § General Diagnosis 4 Fault Determination § Is the data coming through? § Is the data incorrect? § Specific Diagnosis 4 Agent issues § Remote system setup § Connectivity § Review logs on the agent 4 TEP issues § Application support - workspaces / data 4 TEMS issues § Application support – situation issues © 2009 IBM Corporation 27
IBM Software Group | Tivoli software Agent Log Files and Trace Settings § Default location: %CANDLE_HOME%TMAITM 6logs<hostname>_<pc>_k<pc>agent_<instance>_<ti mestamp>-01. log (Windows) $CANDLE_HOME/logs/<hostname>_<pc>_<instance>_<timestamp>-01. log (UNIX/Linux) § Increase unit traces to isolate the issues Problem Area General Startup/Initialization KBB_RAS 1 setting ERROR (UNIT: query ALL) running on Windows ERROR (UNIT: ct_main ALL) running on UNIX/Linux WMI Data Provider ERROR (UNIT: WMI ALL) Perfmon Data Provider ERROR (UNIT: Query. Class ALL) SNMP Data Provider ERROR (UNIT: SNMP ALL) Windows Event Log Data Provider ERROR (UNIT: Event. Log ALL) (UNIT: Win. Log ALL) CIM-XML Data Provider ERROR (UNIT: CIM ALL) © 2009 IBM Corporation 28
IBM Software Group | Tivoli software How can I tell if the endpoint is the problem? § Typical endpoint issues: 4 Connectivity § Firewall – SNMP needs ports 161 and 162 open. – CIM needs ports 5988 and 5989 open. § TCP Stack – Verify TCP connectivity to the remote system using ping, telnet, etc. § DNS – Use nslookup and/or route to verify that the remote system is known to your domain. 4 SNMP or CIM § Daemons not running § Incorrect version of SNMPD or CIM § SNMPD not configured correctly (snmpget, snmpnext, snmpwalk) – snmpget -v 1 –c public rc 2 test. SLES sys. Up. Time. 0 – snmpget -v 3 -u snmpuser -l auth. No. Priv -a MD 5 -A password rc 2 test. SLES sys. Up. Time. 0 © 2009 IBM Corporation 29
IBM Software Group | Tivoli software How can I tell if the endpoint is the problem? § SNMPD daemon is not running 4 Check the ITM logs for the following lines: (2009/10/13, 20: 37: 35. 0001 -A: snmpqueryclass. cpp, 1164, "handle_snmp_response_async") ERROR: decoded PDU is null -- this is a timeout scenario (2009/10/13, 20: 37: 35. 0003 -29: snmpqueryclass. cpp, 1782, "internal. Collect. Data") Timeout occurred. No response from agent 172. 17. 4. 219. § Password error – SNMP v 3 (2009/10/14, 05: 58: 23. 0067 -6: snmpqueryclass. cpp, 1158, "handle_snmp_response_async") Entry (2009/10/14, 05: 58: 23. 0068 -6: snmpqueryclass. cpp, 1164, "handle_snmp_response_async") ERROR: decoded PDU is null -- this is a timeout scenario § Password working – SNMP v 3 (2009/10/14, 05: 40: 01. 0017 -7: snmpqueryclass. cpp, 688, "complete. Init") Host: 172. 17. 4. 219, Port: 161, User: snmpuser, Sec Level 1 (2009/10/14, 05: 40: 01. 0018 -7: snmpqueryclass. cpp, 689, "complete. Init") Auth password: xxxx, proto: 1, key: (2009/10/14, 05: 40: 01. 0019 -7: snmpqueryclass. cpp, 690, "complete. Init") Priv password: xxxx, proto: 1, key: (2009/10/14, 05: 40: 01. 001 A-7: snmpquerymetric. cpp, 89, "get. OID") Entry (2009/10/14, 05: 40: 01. 001 B-7: snmpquerymetric. cpp, 91, "get. OID") OID=1. 3. 6. 1. 25. 2. 3. 1. 1 © 2009 IBM Corporation 30
IBM Software Group | Tivoli software Windows Agentless Monitor fails to collect perfmon data § When using the Windows Agentless Monitor (r 2), the following errors appear in the log: (4891 C 694. 0066 -1558: queryclass. cpp, 1006, "start") Error adding query for class Physical. Disk. (4891 C 694. 0067 -1558: queryclass. cpp, 1007, "start") \rc 2 test 3. tivlab. raleigh. ibm. comPhysical. Disk(*)% Disk Write Time - add returned C 0000 BB 8 § Potential problems: 4 The Counter may simply not exist. Runing the typeperf command (or perfmon GUI) locally on the server when you are trying to collect metrics to verify the command comes back cleanly without error. 4 The Remote Registry service may not be enabled. A remote collector must have registry access to lookup the indexes. Verify the service is enabled and run the typeperf command (or perfmon GUI) remotely to verify the command comes back cleanly without error. 4 The indexes of counters are corrupt. When a request is made, the string name of the counter is requested. That in turn is matched to an index on the target computer. All the perfmon index dictionary name to number maps are stored in the registry here: HKEY_LOCAL_MACHINESoftwareMicrosoftWindows NTCurrent. VersionPerflib 09 On the failing systems with this problem, the "counter" entry there either has no value, or garbage (those empty rectangles). © 2009 IBM Corporation 31
IBM Software Group | Tivoli software Windows Agentless Monitor fails to collect data § Am trying to run the Windows Agentless Monitor (r 2) against one of our machines but am getting errors that I don't know what they mean (48 BF 57 E 9. 0006 -EF 4: wmiqueryclass. cpp, 728, "internal. Collect. Data") : : collect. Data==>Could not connect. Error code = 0 x 80070005 (48 BF 57 E 9. 0007 -AD 4: queryclass. cpp, 790, "internal. Collect. Data") Authentication failed against host test. Sys 1 as user itoperations, return code = 1326 § Potential problems: 4 The User name was not properly specified in the format DomainUser. © 2009 IBM Corporation 32
IBM Software Group | Tivoli software Some workspaces have blank views for Linux § On TEP, the Linux Agentless Monitor (r 4) only shows data for the "Network" and "System" navigator items. 4 Potential Problems: § By default, Red Hat Linux allows connection with the Host Resources MIB and the UCD MIB only through SNMPv 3 connections. § Verify that the following lines are modified or added in the Access Control portion of the /etc/snmpd. conf: view systemview included. 1. 3. 6. 1. 4. 1. 2021 view systemview included. 1. 3. 6. 1. 25 § Verify the SNMP daemon is running by using the ps –ef command. © 2009 IBM Corporation 33
IBM Software Group | Tivoli software Ignore These Errors § You can ignore these: (48 E 297 DC. 0095 -17 E 4: configdata. cpp, 65, "get. Configuration. Property") KR 2_WMI_WIN_PASSWORD_1 not found in the hash map (48 E 297 DC. 0097 -17 E 4: configdata. cpp, 65, "get. Configuration. Property") KR 2_WMI_WIN_PASSWORD_DEFAULT not found in the hash map § The configuration does a fall-back lookup for its required parameters: Subnode Configuration Default Configuration § These errors indicate that they were not overridden in the subnode © 2009 IBM Corporation 34
IBM Software Group | Tivoli software Wrap Up The Agentless Agent technology gives you relatively quick startup and value with limited intrusion on the monitored system! © 2009 IBM Corporation 35
- Slides: 35