IBM Software Group IBM Tivoli Workload Scheduler for
IBM Software Group IBM Tivoli Workload Scheduler for Host 8. 2 PK 06227 - Z/OS V 1 R 7 COMPATIBILITY FOR TWS PK 01415 - TO IMPROVE SERVICEABILITY AND ERROR HANDLING IN AN END-TO-END ENVIRONMENT Steve Viola – Level 2 support © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § Modification for EXIT 7 (OPCAXIT 7) § New EXIT 51 (TWSXIT 51) § Minor enhancements – Access Register Initialization – Console Initialization Change for EQQINIT 2 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § PTFS for 8. 1: – UK 05574 and UK 05582 § PTFS for 8. 2: – UK 05575 and UK 05583 3 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § For z/OS 1. 7, JES 2 EXIT 7 for TWS will not assemble without PK 06227 being applied § For z/OS 1. 7, JES 2 EXIT 7 is not called for input phase processing. EXIT 51 is used instead § EXIT 51 is needed for tracking of STCs and handling of JES input errors (example: bad JECL) 4 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § Installation Steps: – Apply PTFs – Run EQQJOBS for new SEQQSAMP members – Assemble and link EXIT 7 (all z/OS) and EXIT 51(z/OS 1. 7 only) • SMPE: EQQJES 2 U and EQQJES 2 V • Non-SMPE: EQQJES 2 and EQQJES 21 – Define EXIT 51 to JES 2 (z/OS 1. 7 only) – Ensure subsystem modules are loaded at IPL 5 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § Define EXIT 51 to JES 2 (load module TWSXIT 51) • LOAD(TWSXIT 51) EXIT(51) ROUTINES=TWSENT 51, STATUS=ENABLED – Ensure that TWSXIT 51 is in LNKLST or LPALIB so that JES 2 can load it § Subsystem modules modified by PK 06227 must be loaded after IPL: – 8. 1 EQQSSCME and EQQINITE – 8. 2 EQQSSCMF and EQQINITF 6 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § WARNING: The PTFs for PK 06227 may be applied at ANY level of z/OS, however EXIT 51 can ONLY be defined at z/OS 1. 7 level or higher § If PK 06227 is installed on z/OS 1. 7 and EXIT 7 is reassembled but EXIT 51 is NOT defined, most TWS functions will work except for tracking of STCs and invalid JECL statements. Example: – 7 /*ROUTE PRINX TSO 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § Error messages if EXIT 51 used on pre-z/OS 1. 7: – $HASP 466 PARMLIB STMT 2314 LOAD(TWSXIT 51) – $HASP 003 RC=(31), LOAD(TWSXIT 51) - MODULE COULD NOT BE LOADED – *01 $HASP 469 REPLY PARAMETER STATEMENT, CANCEL, OR END – REPLY: END 8 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § Error messages if EXIT 51 used on pre-z/OS 1. 7: – *$HASP 451 ERROR ON JES 2 PARAMETER LIBRARY – *02 $HASP 441 REPLY 'Y' TO CONTINUE INITIALIZATION OR 'N' TO TERMINATE – REPLY : Y 9 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 06227 – Compatibility for z/OS 1. 7 § Error messages if EXIT 51 used on pre-z/OS 1. 7: – $HASP 857 WARNING - EXIT 051 NOT DEFINED WITHIN CURRENTLY LOADED JES 2 MODULES – $HASP 858 EXIT ROUTINE TWSENT 51 (EXIT 051) NOT FOUND – *$HASP 859 REQUESTED EXIT ROUTINE(S) NOT FOUND – *03 $HASP 441 REPLY 'Y' TO CONTINUE INITIALIZATION OR 'N' TO TERMINATE – REPLY : Y 10 12/16/2021 © 2005 IBM Corporation
IBM Software Group z/OS 1. 7 compatibility: subsystem name table § At z/OS 1. 7 level, JES 2 R 4 mode is no longer supported (JES 2 is always ACTIVATED in Z 2 mode). § This means any pre-TWS subsystem definitions for controllers or trackers will cause S 0 C 1 abends in JES 2 (OPC 2. 3 or lower) § Before IPLing on z/OS 1. 7, make sure all TWS subsystems use EQQINITE or EQQINITF 11 12/16/2021 © 2005 IBM Corporation
IBM Software Group z/OS 1. 7 compatibility: subsystem name table § You CANNOT use BUILDSSX and SSCMNAME parameters to change a pre-TWS subsystem definition to a TWS 8. 1 or 8. 2 subsystem definition if JES 2 is in Z 2 mode. § Even if no controller or tracker is started for a pre. TWS subsystem, JES 2 will still abend S 0 C 1 if JES 2 is in Z 2 mode. 12 12/16/2021 © 2005 IBM Corporation
IBM Software Group QUESTIONS & ANSWERS 13 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 : Serviceability and Error Handling for E 2 E § The following slides are based on a presentation given by TWS level 3: – Paolo Falsi – Silvia Fama’ – Annarita Carnevale § PK 01415 is for TWS 8. 2 only 14 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 : Serviceability and Error Handling for E 2 E § Additional APARS: Some problems discovered after PK 01415 have been corrected by these APARS: – PK 11095 – PK 11182 – PK 11351 15 12/16/2021 © 2005 IBM Corporation
IBM Software Group Problems addressed by this PK 01415 § CEEDUMP and SYSMDUMP are collected. § USS files corruptions and/or contentions when multiple AS are generated for the server. § Policy for process restart when an abend occurs. § Lack of Problem Determination information and messages during the daily planning phase. § Lack of Problem Determination information when a file corruption occurs. § Wrong definitions of server and daily planning batch job users and groups. 16 12/16/2021 © 2005 IBM Corporation
IBM Software Group CEEDUMP and SYSMDUMP are collected 17 12/16/2021 © 2005 IBM Corporation
IBM Software Group CEEDUMP and SYSMDUMP are collected Prior to PK 01415, when a server started task abends in the C/C++ code, a CEEDUMP (LE Dump) of the original abend a SYSMDUMP with completion code U 4039 are taken. The CEEDUMP contains just partial data related to the address space and it's not enough for a complete error analysis. with PK 01415 The SYSMDUMP of the original abend is now collected containing also the LEDATA and CEEDUMP information. 18 12/16/2021 © 2005 IBM Corporation
IBM Software Group Documentation changes Diagnosis Guide and Reference Applied the following changes: • In Chapter 2. Initial Problem Analysis, section “Problem-Type Keywords”, delete the following sentence from the description of the ABEND keyword: If you are using the end-to-end feature, you could find the CEEDUMP. * dump file in USS in the /tmp or /homedir directory of the user to which the server started-task is associated (using the STC option). • In Chapter 3. Problem Analysis Procedures, section “Information Needed for All Problems”, delete item 8 c from the list: Collect the CEEDUMP file if it exists 19 12/16/2021 © 2005 IBM Corporation
IBM Software Group USS file corruption and/or contention when multiple AS (address space) are generated for the server. 20 12/16/2021 © 2005 IBM Corporation
IBM Software Group USS file corruption and/or contention when multiple address spaces are generated for the server Customer experienced event files corruption and/or contention when the TWS server generated processes with parent process id (ppid) equal to 1 (for instance Batchman process). The reason for this problem was the cancel of the server started task when multiple address spaces have been generated (only one Address Space must be generated for all the server tasks/processes/threads). with PK 01415 To avoid the generation of multiple address spaces we made the following changes: § Rework of the environmental variables handling; in particular the _BPX_SHAREAS variable is now always set to YES for all the processes and threads. § Batchman, mailman and writers processes have the same PGID of the netman process. 21 12/16/2021 © 2005 IBM Corporation
IBM Software Group Environment variable checks (1 of 3) The putenv() function adds a new environment variable or changes the value of an existing one Before z/os 1. 2, the system copied the string inserted in the putenv call into system allocated storage. Now each setting of envvar requires memory allocation to be executed by the caller program. 22 12/16/2021 © 2005 IBM Corporation
IBM Software Group Environment variable checks (2 of 3) PK 01415 Allocate storage for each environment variable before calling putenv() function 23 A check on each putenv return code has been added in order to check if a putenv of an environment variable gets an error 12/16/2021 © 2005 IBM Corporation
IBM Software Group Environment variable checks (3 of 3) The following error messages are printed in the server MLOG if the putenv() return code is not equal to zero: § EQQ 3129 E module_name PUTENV environment_variable_string FAILED § EQQPT 68 E PUTENV() environment_variable FAILED ERRNO=error_number: error_message, REASON=reason 24 12/16/2021 © 2005 IBM Corporation
IBM Software Group _BPX_SHAREAS environment variable By setting _BPX_SHAREAS to YES, the z/OS will run foreground processes in the same address space that the parent process is running in. The environment variables that affect spawn processing are the ones that are passed into the spawn syscall. The putenv() call, with _BPX_SHAREAS equal to YES, is present more that one time in the code. This situation could produce errors in setting this environment variable with PK 01415 This value is set in the USS environment variable only in the Starter process and should be inherited by the child processes because are provided in the spawn syscall 25 12/16/2021 © 2005 IBM Corporation
IBM Software Group Process PGID Mailman, batchman and writer processes have a PGID different from netman PK 01415 Netman and all the processes started by it now have the same PGID 26 12/16/2021 © 2005 IBM Corporation
IBM Software Group PGID (process group id) § Each process in a process group shares a process group ID (PGID), which is the same as the PID of the first process in the process group. This ID is used for signaling related processes, for example a KILL signal (SIGKILL). § D OMVS, A=ALL output shows PID and PPID but not the PGID. To see PGID values, use TSO OMVS command: ps –ef –o pid, pgid, comm (see output on next slide) 27 12/16/2021 © 2005 IBM Corporation
IBM Software Group Ps –ef display 1 § PID PGID § 1 0 1 § 2 1 2 § 3 83886222 3 § 6 1 6 § 7 1 7 § 8 1 8 § 9 1 9 § 50331658 1 50331658 28 COMMAND BPXPINPR EZBTCPIP /bin/ps EZBTTSSL EZBTMCTL EZACFALG EZASASUB EZBTTMST 12/16/2021 © 2005 IBM Corporation
IBM Software Group Ps –ef display 2 (grep for TWS) § Ps –ef –o pid, pgid, comm | grep TWS § 54 78 § 59 78 50331690 /u/tws 82 bin/translator 54 /u/tws 82 bin/netman § 16777277 54 54 /u/tws 82 bin/mailman § 67108930 54 54 /u/tws 82 bin/writer § 78 50331690 /u/tws 82 bin/starter § 118 16777277 29 54 /u/tws 82 bin/batchman 12/16/2021 © 2005 IBM Corporation
IBM Software Group Process display showing relationships § § § 30 PID PGID 90 CMD 90 1 EQQPHTOP 78 90 90 starter 54 78 54 netman 59 78 90 translator 93 54 54 writer 77 54 54 mailman 118 77 54 batchman 12/16/2021 © 2005 IBM Corporation
IBM Software Group Policy for process restart when an abend occurs 31 12/16/2021 © 2005 IBM Corporation
IBM Software Group Policy for process restart when an abend occurs Before PK 01415: In case of fatal errors (abends), the Starter process restarts his children indefinitely. with PK 01415 NEW translator and netman process policy NEW mailman and batchman process policy 32 12/16/2021 © 2005 IBM Corporation
IBM Software Group NEW translator and netman process policy The restart process has been changed in the following way: => If translator goes down then starter tries to restart it after no more than 5 minutes. => If netman goes down, then mailman, batchman and writers go down. Because translator is strictly related to batchman and mailman, translator goes down also. Also in this case starter tries to restart netman and translator after no more than 5 minutes. Starter tries to restart translator and netman just once; anyway if an abend occurs after more than 2 hours since the last process restart, a new restart is attempted. If the problem persists then message EQQPT 63 E is logged and starter closes. 33 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 Message flow related to the process policy START TIME the following msgs related to processes will be written in MLOG EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/starter" has been started, pid is 50332241 EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/translator" has been started, pid 16777781 EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/netman" has been started, pid is 16778050 the following msg related to netman process will be written in xxx_NETMAN. log AWSEDW 075 I Netman (pid=16778050 pgid=16778050) was started by the starter process (pid=50332241 pgid=67109427) the following msgs related to mailman/batchman/writer processes will be written in xxx_TWSMERGE. log AWSBCV 138 I Mailman (pid=33554844 pgid=16778050) was started by netman (pid=16778050 pgid=16778050) AWSBCW 056 I Writer (pid=50332214 pgid=16778050) was started by netman (pid=16778050 pgid=16778050) AWSBCV 108 I Started Batchman, pin 763 (old message) 34 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 Translator policy example (1 of 2) TRANSLATOR DOWN (kill command issued) the following msgs will be written in MLOG EQQPT 11 I The Translator process (pid=xxxx) has been killed by signal SIGKILL TRANSLATOR RESTART the following msgs will be written in MLOG EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/translator" has been started, pid is 67109583 EQQPT 20 I Input Translator waiting for Batchman and Mailman are started EQQPT 21 I Input Translator finished waiting for Batchman and Mailman the following msgs related to mailman/batchman/writer processes will be written in xxx_TWSMERGE. log AWSBCV 138 I Mailman (pid=xxxx pgid=xxxx) was started by netman (pid=xxxx pgid=xxxx) AWSBCW 056 I Writer (pid=xxxx pgid=xxxx) was started by netman (pid=xxxx pgid=xxxx) AWSBCV 108 I Started Batchman, pin xxxx (old message) 35 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 Translator policy example (2 of 2) TRANSLATOR DOWN (kill command issued after 2 mins) the following msgs will be written in MLOG EQQPT 16 E The Translator process ended abnormally for twice. Starter and his child processes beginning to shut down EQQPT 11 I The Translator process (pid=67109583) has been killed by signal SIGKILL EQQPT 12 I The Netman process ended successfully EQQPT 10 I All Starter's sons ended TRANSLATOR DOWN (kill command issued after 3 hours) the following msgs will be written in MLOG EQQPT 11 I The Translator process (pid=xxxx) has been killed by signal SIGKILL TRANSLATOR RESTART the following msgs will be written in MLOG EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/translator" has been started, pid is xxxxxxx EQQPT 20 I Input Translator waiting for Batchman and Mailman are started EQQPT 21 I Input Translator finished waiting for Batchman and Mailman 36 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 Netman policy example NETMAN DOWN (kill command issued) the following msgs will be written in MLOG EQQPT 11 I The Netman process (pid=xxxxxx) has been killed by signal SIGKILL EQQPT 09 E The Mailman and/or Batchman process (pid=Unknown) ended abnormally EQQPT 33 E Mailman or Batchman ended abnormally. Translator beginning to shut down EQQPT 40 I Output Translator thread is shutting down EQQPT 53 I Output Translator thread has terminated EQQPT 40 I Input Translator thread is shutting down EQQPT 53 I Input Translator thread has terminated EQQPT 40 I Input Writer thread is shutting down EQQPT 53 I Input Writer thread has terminated EQQPT 12 I The Translator process ended successfully NETMAN RESTART the following msgs will be written in MLOG EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/translator" has been started, pid is 50332226 EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/netman" has been started, pid is 50332371 EQQPT 20 I Input Translator waiting for Batchman and Mailman are started EQQPT 21 I Input Translator finished waiting for Batchman and Mailman 37 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 mailman and batchman process policy In case of error (abend) in the mailman or batchman processes the following message will be printed in the MLOG EQQPT 33 E MAILMAN OR BATCHMAN ENDED ABNORMALLY. TRANSLATOR BEGINNING TO SHUT DOWN After that also translator goes down and then its restart policy applies. 38 12/16/2021 © 2005 IBM Corporation
IBM Software Group Batchman policy example BATCHMAN DOWN (kill command issued after 2 mins) the following msgs will be written in MLOG EQQPT 09 E The Mailman and/or Batchman process (pid=Unknown) ended abnormally EQQPT 33 E Mailman or Batchman ended abnormally. Translator beginning to shut dow EQQPT 40 I Output Translator thread is shutting down EQQPT 53 I Output Translator thread has terminated EQQPT 40 I Input Translator thread is shutting down EQQPT 53 I Input Translator thread has terminated EQQPT 40 I Input Writer thread is shutting down EQQPT 53 I Input Writer thread has terminated EQQPT 12 I The Translator process ended successfully BATCHMAN RESTART the following msgs will be written in MLOG EQQPT 01 I Program "/usr/lpp/TWS 820 anna/bin/translator" has been started, pid is 67109540 EQQPT 20 I Input Translator waiting for Batchman and Mailman are started EQQPT 21 I Input Translator finished waiting for Batchman and Mailman 39 12/16/2021 © 2005 IBM Corporation
IBM Software Group Documentation changes Messages and Codes The following msgs have been added: EQQPT 16 E THE PROCESS ENDED ABNORMALLY TWICE. STARTER AND CHILD PROCESSES BEGINNING TO SHUT DOWN EQQPT 33 E MAILMAN OR BATCHMAN ENDED ABNORMALLY. TRANSLATOR BEGINNING TO SHUT DOWN Tivoli Workload Scheduler Administration and Troubleshooting The following msgs have been added: AWSBCV 138 I Mailman (pid = xxx, pgid= xxx) was started by netman (pid= xxx, pgid = xxx). AWSBCW 056 I Writer (pid = xxx, pgid= xxx) was started by netman (pid= xxx, pgid= xxx). AWSEDW 075 I Netman (pid = xxx, pgid = xxx) was started by the starter process (pid = xxx, pgid = xxx). 40 12/16/2021 © 2005 IBM Corporation
IBM Software Group Lack of Problem Determination information and messages during the daily planning phase. 41 12/16/2021 © 2005 IBM Corporation
IBM Software Group Lack of Problem Determination information and messages during the daily planning phase. § Added new MLOG messages for DP Batch, Controller and Server – during the daily planning phase. § Added new TWSworkdir/stdlist/logs/xxxx_E 2 EMERGE. log Server messages – during the CPUs stopping. – during the translator checkpoint file change 42 12/16/2021 © 2005 IBM Corporation
IBM Software Group New messages for MLOG during a DP extend(underlined) (1 of 4) BATCH CONTROLLER SERVER EQQ 3131 I WAITING FOR A CP BACKUP EQQN 121 I START OF DAILY PLANNING ACTIVITY EQQN 051 I A CURRENT PLAN BACKUP PROCESS HAS STARTED EQQN 012 I OPC JOB TRACKING EVENTS ARE NOW BEING……………. . EQQN 090 I THE JOB TRACKING LOG DATA SET…………………. . EQQN 115 I WAITING FOR NCP EQQ 3132 I CREATING A NEW NCP EQQ 3105 I A NEW NCP HAS BEEN CREATED EQQ 3133 I INITIALIZING OF NEW SYMPHONY FILE (RUN NUMBER = &RUNNUMB) EQQ 3106 I Waiting for SCP EQQN 116 I A NEW NCP HAS BEEN CREATED (EQQN 122 I START OF SYMPHONY RENEW ACTIVITY) EQQN 117 I SYNCRONIZATION BETWEEN CONTROLLER AND SERVER STARTED (send the SYNC S event to Server) 43 12/16/2021 © 2005 IBM Corporation
IBM Software Group New messages for MLOG during a DP extend(underlined) (2 of 4) EQQPT 30 I Starting switching Symphony EQQPT 75 I Syncronization between Server and Controller started EQQPT 39 I Stopping Mailman and Batchman processes EQQPT 12 I The Mailman process (pid=xxx) ended successfully EQQPT 12 I The Batchman process (pid=xxx) ended successfully EQQPT 39 I Stopping Input Translator Thread activities EQQPT 24 I Syncronization between Server and Controller ended EQQPT 22 I Input Translator thread stopped until new Symphony will be available (send the SYNC E to Controller) EQQZ 195 I SYNCRONIZATION BETWEEN CONTROLLER AND SERVER ENDED (EQQ 3091 E OPC FAILED TO SYNCHRONIZE WITH THE END-TO-END DISTRIBUTED ENVIRONMENT ) 44 EQQPT 39 I Sstopping all FTWs EQQPT 70 I The stop command has been sent to all the reachable FTWs EQQPT 71 I Waiting for new SCP 12/16/2021 © 2005 IBM Corporation
IBM Software Group New messages for MLOG during a DP extend(underlined) (3 of 4) EQQN 051 I A CURRENT PLAN BACKUP PROCESS HAS STARTED EQQN 012 I OPC JOB TRACKING EVENTS ARE NOW BEING……………. . EQQN 118 I NEW SCP HAS BEEN CREATED (send SYNC Y event to Server) EQQ 3107 I SCP is ready: Start jobs addition to Symphony file EQQN 090 I THE JOB TRACKING LOG DATA SET…………………. EQQPT 72 I Current plan is executing again EQQ 3108 I JOBS ADDITION TO SYMPHONY FILE COMPLETED EQQ 3087 I SYMNEW FILE HAS BEEN CREATED EQQN 111 I A new Symphony file has been created (send SYNC R event to Server) EQQPT 73 I New Symphony file (run numbers=xxx) is ready EQQPT 74 I Starting Mailman and Batchman processes 45 12/16/2021 © 2005 IBM Corporation
IBM Software Group New messages for MLOG during a DP extend(underlined) (4 of 4) EQQPT 74 I Starting Input Translator activities EQQPT 31 I Symphony successfully switched (send SYNC X event to Controller) EQQW 090 I The new Symphony file has been successfully switched EQQPT 20 I Input Translator waiting for Batchman and Mailman are started EQQPT 21 I Input Translator finished waiting for Batchman and Mailman EQQPT 23 I Input Translator thread is running 46 12/16/2021 © 2005 IBM Corporation
IBM Software Group New messages for TWSworkdir/stdlist/logs/xxxx_E 2 EMERGE. log 47 EQQPT 64 I STOP COMMAND SENT TO FAULT TOLERANT WORKSTATION CPUNAME EQQPT 65 I STOP COMMAND SENT TO OPCMASTER EQQPT 69 I SENDING STOP COMMAND TO FAULT TOLERANT WORKSTATION CPUNAME EQQPT 66 I Value contained in the Server checkpoint file: key = value 12/16/2021 © 2005 IBM Corporation
IBM Software Group EQQPT 66 I message description EQQPT 66 I = Value contained in the Server checkpoint file: key = value EQQPT 66 I message prints some Translator Checkpoint file useful variables values at server startup and when these variables are changed. The variables (key) values can be the following: 48 First. Valid. Sym. Run: CPAvailable: Sym. Run. Number: CPRun. Number: Special. Synch. Start: The First Valid Symphony Run Number used by the server. Is the SCP (copy of the active Current Plan) available or not Is a new Symphony available to the server Current Symphony Run Number The Symphony Run Number in relation to the active current plan. Is a “special” synchronization in progress 12/16/2021 © 2005 IBM Corporation
IBM Software Group Lack of Problem Determination information when a file corruption occurs 49 12/16/2021 © 2005 IBM Corporation
IBM Software Group New trace mechanism to get useful information to be used during problem determination – API are provided to the developers to instrument the code – Implemented for every USS Server process – It is a “wrapping trace” – Two trace types are available for every process: • Short trace – with record length = 3*fullword = 48 byte – with records number = 1000 + header record • Long trace – with record length = 13*fullword = 208 byte – with records number = 300 + header record 50 12/16/2021 © 2005 IBM Corporation
IBM Software Group Actual instrumentations using the trace in memory § Written a record for every write/open, in order to do checks to understand a file corruption reasons and the corruption author § A dump will be taken when an error occur accessing an event file showing one of the following messages: – EQQPT 67 E Dump was taken for Problem Determination purpose – AWSDDW 008 E A memory dump was taken to assist in determining the problem 51 12/16/2021 © 2005 IBM Corporation
IBM Software Group Wrong definition of server and daily planning batch job users and groups 52 12/16/2021 © 2005 IBM Corporation
IBM Software Group Wrong definitions of server and Daily planning batch job users and groups (1 of 3) Wrong definition of users or groups may introduce serious errors in an E 2 E environment. Multiple checks have been introduced to prevent these errors and to signal wrong definitions to user. § Using E 2 E, user assigned to Server or DP batch must have a correct definition on RACF database: · user must have defined an OMVS segment (UID) · his default group must have defined an OMVS segment (GID) § Every user defined on RACF database with the same UID of user assigned to Server or DP batch must belong to a group with a defined GID. 53 12/16/2021 © 2005 IBM Corporation
IBM Software Group Wrong definitions of server and Daily planning batch job users and groups (2 of 3) § Checks added: – At start Server and DP batch checks if user assigned has a valid definition; then checks if users with the same UID, belong to a group with a GID defined. – Server makes the same checks every five minutes. – Every user or group checked, that doesn’t have an OMVS segment assigned, is reported in EQQMLOG with an error message. If there is a RACF access error, the problem is reported with a warning message in EQQMLOG. – If DP batch finds an error it stops with return code 12, except for Symphony Renew that stops with return code 8. – if Server finds an error it doesn't stop, but issues an error message. 54 12/16/2021 © 2005 IBM Corporation
IBM Software Group Wrong definitions of server and Daily planning batch job users and groups (3 of 3) § Note: besides the security items checked with PK 01415, every userid that runs a DP batch job and the controller and E 2 E server userids must belong to group eqq. GID (specified in job EQQPCS 05). However, this is NOT being checked even with PK 01415 applied. § Error messages if userid does NOT belong to group eqq. GID are shown in the next slide: 55 12/16/2021 © 2005 IBM Corporation
IBM Software Group Errors if userid does not belong to eqq. GID § ICH 408 I USER(USER 7 ) GROUP(OMVS ) NAME(TEST USER) /var/TWS 820/inst/Symphony CL(DIRSRCH ) FID(01 D 9 F 0 F 1 F 9 F 1 C 5000 F 040000242 B 0000) INSUFFICIENT AUTHORITY TO OPEN ACCESS INTENT(--X) ACCESS ALLOWED(OTHER ---) EFFECTIVE UID(0000000138) EFFECTIVE GID(0000000100 § EQQ 3088 E THE SYMPHONY FILE HAS NOT BEEN CREATED 56 12/16/2021 © 2005 IBM Corporation
IBM Software Group Enhancement request to add eqq. GID checking: § MR 0901057349 57 12/16/2021 © 2005 IBM Corporation
IBM Software Group Additional APARs which correct problems after PK 01415 § PK 01195 : MESSAGE EQQZ 404 W TEXT CONTAINS THE LITERAL &UUID BUT IT SHOULD HAVE THE VALUE OF THE USERID INSTEAD § PTF: UK 06934 is available 58 12/16/2021 © 2005 IBM Corporation
IBM Software Group Additional APARs which correct problems after PK 01415 § PK 11182 : MESSAGE EQQZ 404 W (ADDED BY APAR PK 01415 ) MAY BE ISSUED INCORRECTLY. BPX_DEFAULT_USER IS NOT CHECKED § APAR is currently open, but an APARFIX is available from level 2 support § More information on the security considerations corrected by PK 11182 later in this presentation 59 12/16/2021 © 2005 IBM Corporation
IBM Software Group Additional APARs which correct problems after PK 01415 § PK 11351: AFTER PK 01415 IS APPLIED, A SYMPHONY RENEW JOB ENDS WITH RC=12 (should be RC=08) § PTF: UK 07035 is still open, but an APARFIX is available from level 2 support 60 12/16/2021 © 2005 IBM Corporation
IBM Software Group Security issues resolved by PK 11182 § After applying PK 01415, security problems resulted for the following environments: – If BPX_DEFAULT_USER is set up so that ANY user without an explicit OMVS segment inherits the OMVS segment from the default user – If IRRIRA 00 has been executed so that STAGE 3 is in effect (see z/OS Security Server RACF System Programmer's Guide) – 61 12/16/2021 © 2005 IBM Corporation
IBM Software Group BPX_DEFAULT_USER § If the default user is set up with a valid shell (for example: PROGRAM('/bin/sh') ) then any userid is allowed OMVS access. However, the checking done by PK 01415 expects an EXPLICIT OMVS segment to be defined. If a user that picks up the default segment attempts to run a CP batch job, these messages are issued: – EQQZ 401 E USER BPXDEF HAS NO VALID UID – EQQZ 400 E A USER ID DEFINITION ON RACF CLASS UNIXMAP IS WRONG – EQQZ 400 I CORRECT THE PROBLEM AND RESTART 62 12/16/2021 © 2005 IBM Corporation
IBM Software Group BPX_DEFAULT_USER § If the default BPX user is set up with an invalid shell program, like PROGRAM= /bin/echo , then any user without an explicit OMVS segment is FAILED if they attempt OMVS access. In this case, PK 01415 does not present any new problem. 63 12/16/2021 © 2005 IBM Corporation
IBM Software Group BPX_DEFAULT_USER § To display the BPX default user information, use command: – rlist facility bpx. default. user § This will include this information: – APPLICATION DATA – --------– OEDFLTU/OEDFLTG – (default user/default group) 64 12/16/2021 © 2005 IBM Corporation
IBM Software Group BPX_DEFAULT_USER § Next, do an LU (list user) on the default user id, for example: – LU OEDFLTU OMVS NORACF § Resulting display: – OMVS INFORMATION – --------– UID= 0000000162 – HOME= / – PROGRAM= /bin/echo –. 65 12/16/2021 © 2005 IBM Corporation
IBM Software Group IRRIRA 00 (Stage 3) § Any RACF database created at OS/390 2. 10 or later will be at Stage 3. However, if a RACF database was migrated, it could still be a stage 0, 1, or 2 § Only stage 3 creates problems if PK 01415 is applied, since at stage 3 RACF does not use mapping profiles for UID, GID, SNAME, and UNAME associations. Commands such as ADDUSER no longer maintain the old mapping profiles. 66 12/16/2021 © 2005 IBM Corporation
IBM Software Group IRRIRA 00 continued § The following JCL may be executed to determine what stage the RACF database is in: – //TEST EXEC PGM=IRRIRA 00 – //SYSPRINT DD SYSOUT=* § Sample output: – IRR 66017 I The system is currently operating in stage 3. 67 12/16/2021 © 2005 IBM Corporation
IBM Software Group IRRIRA 00 continued § The problem caused is an incorrect message EQQZ 404 W at E 2 E server startup and when CP batch jobs are run: – EQQZ 404 W RACF ACCESS ERROR WHILE CHECKING USERS WITH UID U 0 – EQQZ 404 I SAF RC: 0004; RACF RC: 0008; RACF REASON CODE: 0000 § However, the E 2 E server and CP batch jobs continue to run correctly 68 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 and related APARS – documentation § After the PTFs for PK 01415 is applied, the documentation changes are in SEQQMISC member EQQPDFST § Download the SEQQMISC member to a PC with extension. pdf in binary mode, and use Adobe Acrobat to read the documentation 69 12/16/2021 © 2005 IBM Corporation
IBM Software Group PK 01415 and related APARS – PTF availability § PK 01415 - PTFS UK 04908, UK 04925, UK 04927 § USS Fix Pack 7 is prereq for PK 01415. USS Fix Pack 8 supercedes PK 01415 § PK 11351 - PTF UK 07035 currrently OPEN § PK 11095 – PTF UK 06934 § PK 11182 – APAR is OPEN 70 12/16/2021 © 2005 IBM Corporation
IBM Software Group TWS 8. 2 – recent USS fix packs § fix pack 6: – APAR PQ 98694 - PTFs UQ 96309 and UQ 96295 § fix pack 7 : – APAR PK 04260 – PTFs UK 02459 and UK 02460 § fix pack 8 : – APAR PK 10713 – PTFs UK 06627 and UK 06629 71 12/16/2021 © 2005 IBM Corporation
IBM Software Group QUESTIONS & ANSWERS 72 12/16/2021 © 2005 IBM Corporation
- Slides: 72