SAS Macros are the Cure for Quality Control

  • Slides: 31
Download presentation
SAS Macros are the Cure for Quality Control Pains Gary Mc. Quown Data and

SAS Macros are the Cure for Quality Control Pains Gary Mc. Quown Data and Analytic Solutions

Rants and Raves of a SAS Programmer

Rants and Raves of a SAS Programmer

Purpose I. Quality Control II. SAS Macros for Quality Control III. Sources of SAS

Purpose I. Quality Control II. SAS Macros for Quality Control III. Sources of SAS Macros and QC Code

I. Quality Control An ongoing effort for validation, improvement and facilitation of the data

I. Quality Control An ongoing effort for validation, improvement and facilitation of the data related process to insure that data meets the business needs.

Quality Control “Quality control means you can have what you need, how you need

Quality Control “Quality control means you can have what you need, how you need it, when you need it. ” E. Demming

Why Practice QC? l It Saves Time l It Saves Money l It Makes

Why Practice QC? l It Saves Time l It Saves Money l It Makes Money l Ignorance is not Bliss

How Data Goes Bad l “Bad Genes”. . Poor design and collection l “Adoption”

How Data Goes Bad l “Bad Genes”. . Poor design and collection l “Adoption” l “Child … Someone Else’s Design Abuse”. . . Poorly Nurtured l “Terrible Teens”. . . Growing Pains

The QC Process 1. Define Requirements 2. Identify Data Issues 3. Analyze Options 4.

The QC Process 1. Define Requirements 2. Identify Data Issues 3. Analyze Options 4. Improve Data Quality • Document every step and repeat

Define Requirements l What do you need? l Requires an understanding of the business

Define Requirements l What do you need? l Requires an understanding of the business process, the data, the operating system and the users. l Documentation, business specs and “experts”.

Devil’s Advocate l What is correct for one task / group may be incorrect

Devil’s Advocate l What is correct for one task / group may be incorrect for another. l What is correct now may be incorrect later. is correct now. . . may not be able to be repeated.

Identify Data Issues l Accuracy l Completeness l Consistency l Timeliness l Uniqueness l

Identify Data Issues l Accuracy l Completeness l Consistency l Timeliness l Uniqueness l Validity

G = Good F = Fair B = Bad

G = Good F = Fair B = Bad

Analyze Options l What do you need? l What do you have? l. What

Analyze Options l What do you need? l What do you have? l. What changes need to be made? l. Will you break anything along the way?

Improve Data Quality l Selective Processing l Clean Existing Values l Correcting Existing Values

Improve Data Quality l Selective Processing l Clean Existing Values l Correcting Existing Values l Delete “bad” data l Add additional data • Document original and new values.

Documentation l Design Process. . . business specs l “As You Go”. . .

Documentation l Design Process. . . business specs l “As You Go”. . . in the code, log, email l Input and Output files (Freqs & Means) l Modifications. . “as per xxx “, email l Exceptions (Errors and Issues) l User’s Manual l Elizabeth Axelrod. . . Big ‘D’ “Just Shoot Them”

General Suggestions l “Drive l l l Out Fear” Early Intervention Obtain “Buy In”

General Suggestions l “Drive l l l Out Fear” Early Intervention Obtain “Buy In” from all parties Keep it “Simple”. . . use macros Be consistent … use macros Monitor results Document everything, every time

II. SAS Macros allow you to use, re-use and share “object-oriented” code. QC is

II. SAS Macros allow you to use, re-use and share “object-oriented” code. QC is very redundant. . the same or similar process performed on each data set, each variable and each process.

Reality People are: l l l Ignorant Forgetful Busy Lazy Don’t Care

Reality People are: l l l Ignorant Forgetful Busy Lazy Don’t Care

Why Macros l Minimal Effort l Parameters l Available (FREE)

Why Macros l Minimal Effort l Parameters l Available (FREE)

FREQOUT Produces Frequencies for multiple variables % FREQOUT (data= /* input dataset name */,

FREQOUT Produces Frequencies for multiple variables % FREQOUT (data= /* input dataset name */, out= freqout /* output data set name , vars= /* list of variables */, by = /* list of by variables */, fmtassign = /* var fmt */, debugging = NO /* YES or NO */ Author: Ian Whitlock Location: www. lexjansen. com and sconsig. com

EAP_RPT %EAP_RPT (DSN=, LIBIN= , LIBOUT=, _VARS= , _FMTS=); DSN = Name of input

EAP_RPT %EAP_RPT (DSN=, LIBIN= , LIBOUT=, _VARS= , _FMTS=); DSN = Name of input SAS data set LIBIN= SAS library of input data set LIBOUT= SAS library of output data set _VARS= list of character variables to review. . paired with _FMTS= list of formats to apply. . . paired with _VARS Example: %EAP_RPT(_VARS = AGE INCOME EDUCATION , _FMTS = AGE INC EDU , LIBIN = PROJ_IN , LIBOUT = PROJ_OUT , DSN = STUDY_1);

DATA CLEANING TIP 00128 a - Cleansing Macro, Data Scrubbing routine (see tip 00128

DATA CLEANING TIP 00128 a - Cleansing Macro, Data Scrubbing routine (see tip 00128 for more) %cleanse(schlib=work, schema=, strlen=50, var=, target=target, replace=replace, case=nocase); Author: Charles Patridge Version: 2. 1 (sug. by Ian Whitlock) Location: www. sconsig. com

REMOVE OUTLIERS %outlier ( data = _SAS_dataset_name_, out = _SAS_output_dataset_name var = _variable_to_screen pass

REMOVE OUTLIERS %outlier ( data = _SAS_dataset_name_, out = _SAS_output_dataset_name var = _variable_to_screen pass = _number_of_passes except = _exception_report_data_set_, mult = _multiplier_of_standard_deviations_) The %OUTLIER macro completes outlier screens based on statistical values of a numeric variable in a SAS data set. It is set up to remove any outlier records that are within a given number of Standard Deviations from the mean, and will run that screen a given number of times. For example, a "3 -Pass-2" outlier screen will remove any values outside 3 standard deviations from the mean, and will run that outlier screen twice. The given numbers can be any integer. Author: Unknown Location: www. spikeware. com

CONT_COMPARE Compares two data sets, list all variables and reports potential issues: 1) Fields

CONT_COMPARE Compares two data sets, list all variables and reports potential issues: 1) Fields in Both 2) Type 3) Length %cont_compare (dsn 1, dsn 2)

KEEPDBLS: Documents Duplicates TIP 000367 - Keep. Dbls %MACRO Keep. Dbls (Source. Ds =_LAST_,

KEEPDBLS: Documents Duplicates TIP 000367 - Keep. Dbls %MACRO Keep. Dbls (Source. Ds =_LAST_, Target. Ds =, Overwrit =N, Id. List =, Where =); Moves duplicate observations to another file. Author: Jim Groeneveld Location: www. sconsig. com

CK_MISSING Evaluates variables in regards to missing and non missing status. Default= _numeric_ missing.

CK_MISSING Evaluates variables in regards to missing and non missing status. Default= _numeric_ missing. _character_ $missing. Parms: DSN = libname and name of data set. Default is the last read/created. PATH= path to directory where QC info is stored. VAR = list of variables to b evaluated. FMT = format statment. %ck_missing( dsn=mylib. recentfile, var=UPB FICO 1 FICO 2 FICO 3 CHANNEL, fmt=UPB upb. FICO 1 FICO 2 FICO 3 fico. CHANNEL $chnl. );

LOG FILTER: Examines and Reports on SAS Log Filter checks your log for errors,

LOG FILTER: Examines and Reports on SAS Log Filter checks your log for errors, warnings, and other "interesting" messages. It then displays what it finds in its summary window. Double-click on a row and it'll reposition the log window to display the message in context (if it's an external log file, it'll open it in a viewer window and position it for you). Author: Ratcliffe Location: http: //ratcliffe. co. uk/rest_logfilt. htm

MK_FORMATS Create a format from a SAS data set. Parms: DSN = SAS data

MK_FORMATS Create a format from a SAS data set. Parms: DSN = SAS data set START =Unique key value ie. SSN LABEL =Value to be associated with start ie. Full Name with SSN FMTNAME =Name of Format (sans ". ") TYPE = C or N for Character or Numeric LIBRARY = Libname of Format Library (default =work) OTHER = Value to supply for missing (default =OTHER)

III. Sources of SAS Macros and QC Code www. sas. com (examples) www. lexjansen.

III. Sources of SAS Macros and QC Code www. sas. com (examples) www. lexjansen. com (proceeding) www. sconsig. com www. ratcliffe. co. uk www. statetechservices. com www. spikeware. com

More Sources www. mcw. edu/pcor/rsparapa/sasmacro. html www. math. yorku. ca/scs/friendly. html www. stat. ncsu.

More Sources www. mcw. edu/pcor/rsparapa/sasmacro. html www. math. yorku. ca/scs/friendly. html www. stat. ncsu. edu/sas/samples/index. html www. dasconsultants. com SAS-L Books By Users: Ron Cody’s Data Cleaning Numerous books on Macros. . “By Example”

Questions ? Gary Mc. Quown mcquown@DASconsultants. com www. DASconsultants. com

Questions ? Gary Mc. Quown mcquown@DASconsultants. com www. DASconsultants. com