Improving Batch Application Service Through Tuning and Parallelism

  • Slides: 31
Download presentation
Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support Manager SAS

Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support Manager SAS Institute Cary, NC USA Dan. [email protected] com Copyright © 2005, SAS Institute Inc. All rights reserved.

Some Business Drivers for Performance Improvement Acidities § Increasing data volumes • More customers

Some Business Drivers for Performance Improvement Acidities § Increasing data volumes • More customers • More data about each customer needed for increasingly sophisticated analytics which aid better and more timely decision-making. § Decreasing processing window • Improve BI application availability by shortening ETL elapsed time. § Increasing pressure to reduce costs • Lower resource requirements • Improve competetive position Copyright © 2005, SAS Institute Inc. All rights reserved.

Session Overview § This session focuses on processing improvements beneficial to handling large data

Session Overview § This session focuses on processing improvements beneficial to handling large data volumes. § Performance improvement areas • • CPU optimization Reducing I/O Improved overlap and parallelism Elapsed time optimization (Not the same § Focus Areas • DATA STEP tuning • New SAS 9 features Copyright © 2005, SAS Institute Inc. All rights reserved.

Session Outline § Don’t forget the basics! - A Short. Tuning Case Study §

Session Outline § Don’t forget the basics! - A Short. Tuning Case Study § § § DATA Step Views PROC SUMMARY w/DATA Step View DATA Step hash table functions SAS Parallel Data Engine (SPDE) SAS/Connect Pipes Wrap-up Copyright © 2005, SAS Institute Inc. All rights reserved.

Back to Basics: High-Volume DATA Step Optimization § Before implementing parallel operations, make sure

Back to Basics: High-Volume DATA Step Optimization § Before implementing parallel operations, make sure basic processing flow is efficient § When processing high volumes of data, even apparently small changes can have a large effect § The following customer case study illustrates several points. Copyright © 2005, SAS Institute Inc. All rights reserved.

Program processes 36 million MXG TYPE 74 records (436 CPU seconds 9672 G 6)

Program processes 36 million MXG TYPE 74 records (436 CPU seconds 9672 G 6) DATA FILE. A; SET INFILE 1. TYPE 74; KOUNT = 1 ; IF VOLSER = '. ' OR VOLSER = ' ' THEN DELETE ; IF SYSTEM = '888 K' OR SYSTEM = '888 Z' OR SYSTEM = '888 Q' OR SYSTEM = '888 V' OR SYSTEM = '888 P' THEN DO ; IF DATEPART(SYNCTIME) < '03 APR 04'D THEN SYNCTIME = SYNCTIME - '06: 00. 00'T ; IF DATEPART(SYNCTIME) > '02 APR 04'D THEN SYNCTIME = SYNCTIME - '05: 00. 00'T ; END ; SYMNUM = 0 ; IF DATEPART(SYNCTIME) < '17 MAY 04'D THEN DO ; IF DEVNR > 58 FFX AND DEVNR < 5 FFFX THEN SYMNUM = 111; IF DEVNR > 6 FFFX AND DEVNR < 7 FFFX THEN SYMNUM = 456; IF DEVNR > 7 FFFX THEN SYMNUM = 234; IF DEVNR => 5000 X AND DEVNR < 5200 X THEN SYMNUM = 234; IF DEVNR => 5 FFFX AND DEVNR < 7000 X THEN SYMNUM = 876; END; IF DATEPART(SYNCTIME) > '17 MAY 04'D THEN DO ; IF DEVNR > 4 FFFX AND DEVNR < 7000 X THEN SYMNUM = 223; IF DEVNR > 6 FFFX AND DEVNR < 7 FFFX THEN SYMNUM = 456; IF DEVNR > 7 FFFX THEN SYMNUM = 234; END; Copyright © 2005, SAS Institute Inc. All rights reserved. TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT TIPPCT 5. 2 ; IF SYMNUM = 0 THEN DELETE ; IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ; IO_22355 = 0 ; IF SYMNUM = 1111 THEN IO_1111 = IORATE ; IF SYMNUM = 4563 THEN IO_4563 = IORATE ; IF SYMNUM = 234 THEN IO_234 = IORATE ; IF SYMNUM = 8765 THEN IO_8765 = IORATE ; IF SYMNUM = 22355 THEN IO_22355 = IORATE ; DATE = DATEPART(SYNCTIME) ; FORMAT DATE 7. ; INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME 19. 2 ; EMCTYPE = 'ESCON' ; IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ; IF IORATE < 10 THEN DELETE ; KEEP VOLSER DEVNR TIPPCT DATE INTE SYMNUM IO_1111 IO_4563 IO_234 IO_8765 SYNCTIME IO_22355 EMCTYPE IORATE AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR PCTDVCON PCTDVUSE KOUNT ;

Do filtering as early as possible TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT

Do filtering as early as possible TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT TIPPCT 5. 2 ; IF SYMNUM = 0 THEN DELETE ; IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ; IO_22355 = 0 ; IF SYMNUM = 1111 THEN IO_1111 = IORATE ; IF SYMNUM = 4563 THEN IO_4563 = IORATE ; IF SYMNUM = 234 THEN IO_234 = IORATE ; IF SYMNUM = 8765 THEN IO_8765 = IORATE ; IF SYMNUM = 22355 THEN IO_22355 = IORATE ; DATE = DATEPART(SYNCTIME) ; FORMAT DATE 7. ; INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME 19. 2 ; EMCTYPE = 'ESCON' ; IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ; IF IORATE < 10 THEN DELETE; KEEP VOLSER DEVNR TIPPCT DATE INTE SYMNUM IO_1111 IO_4563 IO_234 IO_8765 SYNCTIME IO_22355 EMCTYPE IORATE AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR PCTDVCON PCTDVUSE KOUNT ; Copyright © 2005, SAS Institute Inc. All rights reserved. §Move to top of DATA Step § CPU Time reduction 67%

Additional Steps § Put KEEP= as DATA SET option to bring in fewer variables

Additional Steps § Put KEEP= as DATA SET option to bring in fewer variables into the DATA step. Note: This decreases CPU time, but not I/O time. § Use IF-THEN-ELSE or SELECT instead of just IF-THEN. § Eliminated redundant DATEPART function calls. § Cumulative CPU time reduction: 80% Copyright © 2005, SAS Institute Inc. All rights reserved.

Final Step § Move filtering of blank VOLSER and IORATE <10 to WHERE clause

Final Step § Move filtering of blank VOLSER and IORATE <10 to WHERE clause DATA SET option. § Total cumulative CPU time reduction: 86% Net savings of 368 CPU seconds Copyright © 2005, SAS Institute Inc. All rights reserved.

The Value of CPU Time Reduction § Always important on the mainframe because it

The Value of CPU Time Reduction § Always important on the mainframe because it is inherently a multi-workload beast. § Often considered unimportant (or less so anyway) on Windows and UNIX platforms because of dedicated nature of systems. Elapsed time is often more important. § Changing with increasing use of server virtualization. Affects how many virtual servers can run on a physical platform. • Logical Partitions or Domains on UNIX systems • Virtual Machines on Windows and Linux systems Copyright © 2005, SAS Institute Inc. All rights reserved.

Some General Strategies for Improving Processing of Large Data Volumes § Reduce volume of

Some General Strategies for Improving Processing of Large Data Volumes § Reduce volume of data passed (e. g. keep only required variables in intermediate files) § Reduce number of data basses § Eliminate or reduce use of non-linearly scalable techniques such as sorting. § Exploit memory § Exploit processing overlap and parallelism Copyright © 2005, SAS Institute Inc. All rights reserved.

Exploiting New SAS Features § We’ll use two scenarios from common processing challenges encountered

Exploiting New SAS Features § We’ll use two scenarios from common processing challenges encountered when processing transaction data for performance and service level reporting purposes. § The improvements made to the processing strategy for these scenarios …. . • Reduce number of data basses • Eliminate or reduce use of non-linearly scalable techniques such as sorting. • Exploit memory • Exploit processing overlap and parallelism Copyright © 2005, SAS Institute Inc. All rights reserved.

General Scenario Chrematistics § Very high data volumes (millions of records, tens or hundreds

General Scenario Chrematistics § Very high data volumes (millions of records, tens or hundreds of Gigabytes § Multiple summarizations desired § Detail records retained only for exceptional cases. Copyright © 2005, SAS Institute Inc. All rights reserved.

Scenario One § High-volume transaction data, say from web log, CICS, DB 2, SAP

Scenario One § High-volume transaction data, say from web log, CICS, DB 2, SAP § Desired summarized file for service level management, accounting, performance and capacity management. § Not interested in keeping every detail transaction record. Copyright © 2005, SAS Institute Inc. All rights reserved.

DATA Step Views § Can be used to eliminate a data passes § Runs

DATA Step Views § Can be used to eliminate a data passes § Runs two tasks in parallel, but does not multi-process § In this case, eliminates one pass of the data. Copyright © 2005, SAS Institute Inc. All rights reserved. data lib. a / view=lib. a; infile ……; input x ……; run; proc sort data=lib. a; by x; run;

SAS DATA Step View caveats § Can inhibit use of indexed I/O; Data Set

SAS DATA Step View caveats § Can inhibit use of indexed I/O; Data Set Option WHERE clause cannot use index with a DATA Step view. § DATA Step views are sensitive not only to SAS release and version levels, but sometimes to maintenance levels. Copyright © 2005, SAS Institute Inc. All rights reserved.

DATA Step Views with Proc Summary § Eliminate data passes and saves disk space.

DATA Step Views with Proc Summary § Eliminate data passes and saves disk space. § Eliminate sort § Can produce multiple summarization data sets in one pass § Benefits from large region size (enough to hold crossings) § SUMMARY in SAS 9. 1 • Multithreaded • Does not keep n-way in memory unless needed. Copyright © 2005, SAS Institute Inc. All rights reserved. data lib. a / view=lib. a; infile ……; input a b x y……; run; proc summary data=lib. a; CLASS statement; TYPES statement; OUTPUT statement(s); run;

SAS 9 Threaded Procedures § § § SORT SUMMARY/MEANS TABULATE REPORT SQL REG, GLM,

SAS 9 Threaded Procedures § § § SORT SUMMARY/MEANS TABULATE REPORT SQL REG, GLM, LOESS, DMREG, DMINE Copyright © 2005, SAS Institute Inc. All rights reserved.

Scenario Two § High Volume Event data (time-oriented (e. g. ARM log) § Transactions

Scenario Two § High Volume Event data (time-oriented (e. g. ARM log) § Transactions must be constructed from multiple event records • Type S – transaction start ( ID, start time, code, ) • Type E – transaction end ( ID, end time, CPU time) Copyright © 2005, SAS Institute Inc. All rights reserved.

Data arrival pattern Start 1 Start 2 End 1 (write out 1) Start 3

Data arrival pattern Start 1 Start 2 End 1 (write out 1) Start 3 End 2 (write out 2) Start 4 Start 5 End 4 (write out 4) End 5 (write out 5) End 3 (write out 3) Copyright © 2005, SAS Institute Inc. All rights reserved.

DATA Step Hash Table Support (New in SAS 9) § Can replace lookup formats

DATA Step Hash Table Support (New in SAS 9) § Can replace lookup formats § Can have entries dynamically added, modified, and removed § For this Scenario, use a Hash Table to accumulate transaction records from start and events. Copyright © 2005, SAS Institute Inc. All rights reserved.

data transactions view=transactions; declare hash transactions(); transactions. define. Key("tr_id"); transactions. define. Data("tr_start", "tr_code“); transactions.

data transactions view=transactions; declare hash transactions(); transactions. define. Key("tr_id"); transactions. define. Data("tr_start", "tr_code“); transactions. define. Done(); input type @; if type = 'S' then do; input tr_id tr_code tr_start; rc=transactions. add(); end; else if type='E' then do; input tr_id tr_end tr_cpu; rc = transactions. find(); response = tr_end - tr_start; output; rc = transactions. remove(); end; Copyright © 2005, SAS Institute Inc. All rights reserved.

The Scalable Parallel Data Engine (SPDE) § New in SAS 9. 1 § Included

The Scalable Parallel Data Engine (SPDE) § New in SAS 9. 1 § Included with BASE § Available on all 9. 1 platforms § Advantages • Parallel data loading and index creation • Parallel reads and searches • Uses multiple indices to resolve a search Copyright © 2005, SAS Institute Inc. All rights reserved.

SPDE – Scalable Performance Data Engine SAS® System data Scalable Performance Data Engine metadata

SPDE – Scalable Performance Data Engine SAS® System data Scalable Performance Data Engine metadata 1 data 2 data 3 data 4 index Hybrid index Bitmap/B-tree Copyright © 2005, SAS Institute Inc. All rights reserved.

SAS SPDE implementation on z/OS § USS thread services § USS directory-based file systems

SAS SPDE implementation on z/OS § USS thread services § USS directory-based file systems • z. FS • h. FS • NFS file systems § Exploitation • Define file system • Change LIBNAME engine specification Copyright © 2005, SAS Institute Inc. All rights reserved.

SPDE data set allocation on z/OS § NFS – follow same guidelines as for

SPDE data set allocation on z/OS § NFS – follow same guidelines as for Open Systems § HFS – Use separate HFS file systems for DATA and INDEX components; perhaps multiple for DATA. Spread HFS’s across Shark (ESS 2105) loops. § z. FS - No special considerations! Use multivolume z. FS particularly if • Storage system has Parallel Access Volumes (PAV) • ESS 2105 -800 has Arrays Across Loops feature Copyright © 2005, SAS Institute Inc. All rights reserved.

Scalability – SAS 9. 1 SAS Scalable Architecture in SAS Foundation Scalable Performance Data

Scalability – SAS 9. 1 SAS Scalable Architecture in SAS Foundation Scalable Performance Data Access SAS Teradata. Sybase DB 2 Oracle Scalable SAS/ACCESS Pipin SAS SAS g g CONNECT CPU 1 CPU 2 Threaded Procedures THREAD 1 THREAD 2 THREAD N… Copyright © 2005, SAS Institute Inc. All rights reserved. Remote Host

MP Connect Pipes § § New in SAS 9 Uses TCP/IP socket engine Superior

MP Connect Pipes § § New in SAS 9 Uses TCP/IP socket engine Superior to DATA Step View approach Provides true multi-processing Copyright © 2005, SAS Institute Inc. All rights reserved.

/* ----- DATA STEP - PROCESS P 1 ------ */ /* ---- SUMMARY -

/* ----- DATA STEP - PROCESS P 1 ------ */ /* ---- SUMMARY - PROCESS P 2 ----- */ SIGNON P 1 SASCMD='!SASCMD'; RSUBMIT P 1 WAIT=NO; LIBNAME OUTLIB SASESOCK ": PIPE 1"; SIGNON P 2 SASCMD='!SASCMD'; RSUBMIT P 2 WAIT=NO; LIBNAME INLIB SASESOCK ": PIPE 1"; data outlib. transactions; declare hash transactions(); transactions. define. Key("tr_id"); transactions. define. Data("tr_start", "tr_code“); transactions. define. Done(); proc summary data=inlib. transactions; CLASS statement; TYPES statement; OUTPUT statement(s); run; input type @; if type = 'S' then do; input tr_id tr_code tr_start; rc=transactions. add(); end; PROC PRINT; RUN; ENDRSUBMIT; WAITFOR _ALL_ P 1 P 2; else if type='E' then do; input tr_id tr_end tr_cpu; rc = transactions. find(); response = tr_end - tr_start; output; rc = transactions. remove(); end; ENDRSUBMIT; Copyright © 2005, SAS Institute Inc. All rights reserved.

In Summary…… § Remember the importance of basic SAS program tuning skills which have

In Summary…… § Remember the importance of basic SAS program tuning skills which have been well-known for years. § Take advantage of the significant SAS 9 features which can help you • Improve response and turnaround times • Improve availability times for BI applications by shortening the batch window. • Reduce costs by cutting resource consumption and utilizing the most effective combination of CPU, memory, and I/O resources Copyright © 2005, SAS Institute Inc. All rights reserved.

Copyright © 2005, SAS Institute Inc. All rights reserved. 31

Copyright © 2005, SAS Institute Inc. All rights reserved. 31