Data sets access methods and DFSMS Dave Betten
Data sets, access methods and DFSMS Dave Betten IBM Cloud and Systems Performance © 2018 IBM Corporation 1
This presentation is intended to provide some basic information about data sets and access methods • This is by no means a complete tutorial • Rather, I’ve tried to cover some basics and prepare for some follow on discussions and mentoring – Some basics on the difference between data sets and access methods – Explain some of the terminology you’ll probably hear used often – Discuss some of the benefits of extended format data sets • Types of compression, data striping, etc. – Review some basic concepts of Systems Managed Storage (SMS) – Explain some older technologies that might still be in use today • Hiperbatch, VIO © 2018 IBM Corporation 2
It’s important to understand the difference between data sets and access methods • • Let’s set VSAM aside for now There are two main types of non-VSAM data sets – Physical sequential data sets which are commonly referred to as flat files • • Records are stored sequentially into the data set Can reside on disk or tape – Partitioned data sets (PDS) • • • Access methods are software interfaces used to access data sets – BSAM – Basic Sequential Access Method – QSAM – Queued Sequential Access Method – BPAM – Basic Partitioned Access Method – There are other very old access methods that are rarely used • • • A PDS has a directory that can point to multiple members Each member can be accessed individually much like you access physical sequential files BDAM – Basic Direct Access Method ISAM – Indexed Sequential Access Methodf The same data set can be accessed via different access methods depending on the program – Some low level programs build their own channel programs to access the data • • Commonly referred to as EXCP access method DFSORT is a common user of EXCP © 2018 IBM Corporation 3
Data set characteristics • Record formats (RECFM) – – – • Fixed (F) Fixed Blocked (FB) Variable Blocked (VB) Variable Spanned (VBS) Undefined length (U) Record length (LRECL) – For F and FB formats it’s the length of the records. • Every record has the same length – For VB and VBS, it’s the maximum length • The length of the records varies but non cannot exceed the maximum • Each record begins with a 2 byte field that contains the length of the record – U is normally used for things like load libraries • Block size – The size (in bytes) of each block of records © 2018 IBM Corporation 4
Terminology • BUFFER – An area in a program's virtual storage that holds the data from one physical block of the data set • BLOCK – A collection of contiguous records which is the smallest unit of transfer between a processor and an I/O subsystem. • I/O REQUEST – Actual command to the I/O subsystem to transfer one or more blocks of data. – Blocks per I/O is determined by the access method, program, and JCL. • EXCP – Is actually an MVS macro instruction to EXecute a specified Channel Program. – For QSAM and BSAM, EXCP count has come to mean the number of blocks transferred. © 2018 IBM Corporation 5
QSAM • Uses z/OS standard GET and PUT macros – Data is processed at the RECORD level • Access method handles blocking and unblocking of records • Access method manages synchronization • Access Method manages overlapping requests • BUFNO - Buffer Number – – Number of buffers allocated for storing blocks of data. Specified in program's DCB or in JCL Default BUFNO is 5 Buffers allow multiple blocks to be accessed in a single I/O © 2018 IBM Corporation 6
BSAM • Uses the z/OS standard READ and Write macros – Data processed at the BLOCK level – Program must manage Blocking and buffering • Must move records to and from blocks an maintain variable lengths • Must provide buffer address for READ and WRITE macros • Program is responsible for synchronizing requests and overlapping processing – Program must issue CHECK, WAIT or EVENTS macro to determine if requested operation completed successfully – Must have sufficient buffers available – Some programs written to exploit whatever buffers available at run time – Other programs only exploit a fixed number of buffers • NCP - Number of Channel Programs – Number of READ or WRITE requests a program may issue before a CHECK is issued – Specified in program's DCB macro (some programs examine NCP or BUFNO in JCL). – Default NCP is 1. Any more is program dependent. © 2018 IBM Corporation 7
Using standard access methods simplifies coding and maintenance • Much simpler coding logic – Coding channel programs is not for the faint of heart!! • Handles things like synchronization, recovery, etc. • Less likely to require changes to user code – DFSMS takes care of updating the access method code to support new technology and exploit new features – Things like extended format, z. HPF, compression, encryption, etc. © 2018 IBM Corporation 8
There are three types of non-VSAM data sets • Basic format – Maximum of 59 volumes – Limited to 64 K tracks per volume – Maximum of 16 extents per volume • Large format – Similar to Basic but can exceed 64 K tracks per volume • Extended format – – – Extended Format relieves some of the limitations Logically the same format Stored differently on the hardware to exploit hardware and software facilities of SMS Must be SMS managed Enabled through allocation or data class parameter Allows some extra features: • Compression • Data Striping • Extended Addressing (larger files) • VSAM Allocation and Buffering © 2018 IBM Corporation 9
Virtual Storage Access Method (VSAM) • VSAM data sets only reside on disk and are only accessed via the VSAM access method • They support three types of access – Random (or direct) – Sequential – Skip Sequential • VSAM functions consist of two major parts – Catalog management – extensive information about VSAM data sets is stored in the catalog – Record management – this part contains the access method code • Data is logically stored in Control Intervals (CIs) and Control Areas (CAs) – CIs are stored physically in blocks – Control Areas contain multiple Control Intervals and are pointed to by index records • Maximum size of a CA is one cylinder © 2018 IBM Corporation 10
There are four types of VSAM data sets • Key-Sequenced Data Sets (KSDS) – – • Consist of an index and data component Records contain a key and data The index provides direct access to any record in the data component The data component can also be accessed sequentially Entry-Sequenced Data Sets (ESDS) – Has no index – Records are in the order they were added – Can be accessed sequentially or direct using Relative Block Adress (RBA) • Relative Record Data Sets (RRDS) – – • Pre-formatted fixed length records Sequenced by relative number Records accessed by Relative Record Number (RRN) Allows direct and sequential access Linear Data Sets (LDS) – – Byte-addressable storage CI size is a multiple of 4096 Similar to a non-VSAM data set with some VSAM facilities Most common user is DB 2 © 2018 IBM Corporation 11
VSAM data sets can also be allocated as extended format • Similar benefits to non-VSAM – Compression, striping, etc. • There additional benefits unique to VSAM – Extended addressability allows a VSAM file to exceed 4 GB in size – Systems managed buffering improves performance – And now encryption! • Certain VSAM data sets cannot be extended format – Catalogs – System data sets (since DFSMS isn’t active yet at IPL time) – Temporary data sets © 2018 IBM Corporation 12
Extended format - Compression • Compression is one of the main benefits of extended format • There are three types or compression – Initially there were only two: • Generic: • Tailored: A standard dictionary is used as the basis for compression Only used for non-VSAM, a tailored dictionary is created based on initial sampling of the data – Usually provides better compression ratios – z. EDC compression addressed one of the main inhibitors to compression – CPU cost • Both Generic and Tailored compression drive up CPU usage • With z. EDC, the compression is performed on a PCIE attached card, thus offloading the CPU cost • Another benefit of z. EDC is much improved compression ratios and reduced I/O • Compressing data sets reduces disk storage usage but also improves performance – Less data being transferred between the disk and the processor reduces I/O time © 2018 IBM Corporation 13
Extended Format - Data Striping: Implements parallel I/O to reduce elapsed times © 2017 2018 IBM Corporation 14
Data sets must be DFSMS managed to be Extended Format • DFSMS allows for policy based management of data • Automatic Class Selection (ACS) routines automate the assignment of data sets – Data Class – data set allocation and space attributes • This is where you can specify extended format, compression, extended addressability, etc. – Storage Class - performance goals and availability and accessibility requirements • This is where you can control the number of stripes – Management Class - management attributes (retention, migration, backup, etc. ) – Storage Group – a collection of storage volumes and attributes • The Interactive Storage Management Facility is used to define and manage the classes – ISPF based application – On the SYSD test system go to Option W and then Option I • You can then list all of the various classes defined on the system • ACS routines stored in OSPPGE. SMS. SOURCE • We can look at these together in a future call © 2018 IBM Corporation 15
Most of the Data Class attributes can be overridden with JCL parameters • The DSNTYPE parameter is the most notable – – – – LIBRARY – Partitioned Data Set Extended (PDSE) PDS – Partitioned Data Set HFS – Hierarchical File System LARGE – Creates a large-format sequential EXTREQ – Extended format, required EXTPREF – Extended format, preferred BASIC – Basic format sequential • Others might be space allocations, volume count, retention, etc. © 2018 IBM Corporation 16
So why aren’t more customers exploiting extended format • The Using Data Sets manual lists the restrictions for extended format sequential – Restrictions: The following types of data sets cannot be allocated as extended-format sequential data sets: • PDS, PDSE, and direct data sets, except VSAM • Non-system-managed data sets • VIO data sets. – The following types of data sets should not be allocated as extended-format sequential data sets: • • • System data sets GTF trace Data Facility Sort (DFSORT) work data sets Data sets used with Hiperbatch Data sets accessed with EXCP Data sets used with checkpoint/restart • And the restrictions listed for VSAM include – the data (IMBED parameter) or the data to be split into key ranges (KEYRANGES parameter). – An open for improved control interval (ICI) processing is not permitted for extended format data sets. © 2018 IBM Corporation 17
Hiperbatch is an older technology we rarely see being used now • MVS function to retain data in storage – Originally intended to exploit expanded storage but now backed by central storage • Addresses two common characteristics of batch – Multiple jobs requesting data from the same data set simultaneously. – Jobs passing temporary or short-lived data sets to subsequent jobs. • Uses the Data Lookaside Facility (DLF) to store data in a DLF object • Supports QSAM and VSAM – For VSAM KSDS, data component only – Note that BSAM and EXCP not supported – Does not require application or JCL changes © 2018 IBM Corporation 18
Hiperbatch Retain and Non-Retain • Retain – Intended for one writer followed by 1 or more readers – Writer creates on DASD while copy placed in estor concurrently – Entire file placed in DLF object – Later readers retrieve from DLF object – Later writers update DLF object and DASD concurrently – DLF object must be explicitly deleted • Non-Retain – Intended for concurrent readers – First reader retrieves from DASD • Copy placed in DLF object concurrently – Following readers retrieve from copy in storage – Entire file need not fit in memory • Storage stolen from just behind last reader – DLF object deleted when open count reaches 0 • Retain requires enough storage to load entire file into • Non-Retain has a much smaller storage memory requirement – Beneficial for files accessed concurrently by numerous jobs © 2018 IBM Corporation 19
VIO • Allows temporary data sets to be buffered in storage • Eliminates all I/O to the data set – No hardening of data on DASD – Track window in address space • Can be implemented vis DFSMS – Define VIO storage group(s) • Access controlled by VIOMAXSIZE parameter • If primary + all secondaries > VIOMAXSIZE then no VIO – Code ACS routines to direct data sets • VIO storage group in list of candidate storage groups • Transparent to users © 2018 IBM Corporation 20
DFSORT builds its own channel programs to access sort work data sets The complex sort algorithms require DFSORT to directly access blocks of sort work data rather than write and read sequentially. A typical sort flow looks something like this: SORTWK 01 DFSORT Job 1. 2. 3. 4. 5. 6. Read from SORTIN as much as we can fit in a Record Storage Area (RSA) allocated in the programs virtual storage Sort the data in RSA and write sorted string to SORT WORK Read another bunch of records from SORTIN into RSA Sort current bunch of records and write sorted string to SORT WORK Repeat steps 3 and 4 until end of SORTIN Merge sorted strings together and write sorted file to SORTOUT SORTWK 02 SORTWK 03 SORTIN SORTOUT SORTWK 04 © 2017 2018 IBM Corporation 21
So we have to think about why more customers are not exploiting extended format • Yes there are restrictions like we’ve just reviewed – But there are still many data sets that are eligible • My experience is that many customers just don’t have the time to convert – Requires analysis to make sure all access meets requirements – Testing to verify nothing breaks – Changes to JCL or DFSMS settings to do the conversion • Compression was usually the main motivator for clients with extremely large files – Even then, they implemented for a limited subset of the eligible data sets – I haven’t seen striping used in a large number of environments • That may be a result of faster channels and disk subsystems reducing the need • What we need is a way to help with that analysis but also convince them it’s worth the time – Compression and striping might provide savings to generate interest – Pervasive encryption is certainly a game changer in motivating customers © 2018 IBM Corporation 22
So where do we go from here? • We can schedule some time to look at the WSC test system together – Look at the current DFSMS settings – Understand how to allocate extended format data sets • As well as compression and striping • Look at how we might provide SMF analysis to help customers identify eligible data sets – Should probably review z. BNA’s current capabilities – I have another presentation that gives an overview of what SMF is and how I use it • I’m happy to help as you come up with your own ideas © 2018 IBM Corporation 23
- Slides: 23