When PCCF isnt enough PCCF handson session Russell

  • Slides: 39
Download presentation
When PCCF isn’t enough: PCCF+ (hands-on session) Russell Wilkins Health Analysis Division, Statistics Canada,

When PCCF isn’t enough: PCCF+ (hands-on session) Russell Wilkins Health Analysis Division, Statistics Canada, Ottawa ACCOLEDS 2011 Kwantlen Polytechnic University Richmond (BC) Campus, Wed 30 November 2011

Note • PCCF+ programs and reference files (except for the User Guide) are all

Note • PCCF+ programs and reference files (except for the User Guide) are all plain text files, which can be browsed and edited with any text editor such as Notepad, etc. • The outputs of PCCF+ programs are also plain text files which can be browsed and edited with any text editor such as Notepad, etc. • You don’t have to know SAS to use PCCF+: it’s just a “black box” which you use.

Running PCCF+ as a “black box” • Follow the “getting started” instructions in the

Running PCCF+ as a “black box” • Follow the “getting started” instructions in the User Guide, and run SAS yourself. • Ask a friend at your institution who knows SAS to submit your program for you, and send you the output files. • Work collaboratively with a colleague at another institution, who will submit your programs for you, and send you the output, as well as comments, etc.

Residential or Institutional ? • GEORES 5 x. SAS – Use to code records

Residential or Institutional ? • GEORES 5 x. SAS – Use to code records where the postal code is for a place of residence • GEOINS 5 x. SAS – Use to code records where the postal code is for a health care facility, doctor’s office or other institution or business

SAS-specific Housekeeping • Before running a new SAS program (e. g. , GEOINS 5

SAS-specific Housekeeping • Before running a new SAS program (e. g. , GEOINS 5 J. SAS), you need to clear the program Editor, Log, and Output (list) windows. • Otherwise, anything new will simply be appended to what was already there, and you’ll have a mess (hopefully not saved)! • For each window to be cleared: • Click anywhere within the window, then hit Edit / Clear All.

Getting Started (User Guide pp 5 -6) • 1. Copy all files to a

Getting Started (User Guide pp 5 -6) • 1. Copy all files to a directory (e. g. , c: PCCF 5 J) – and change the FILENAME path to your directory (c: PCCF 5 J) – Select and edit the SAS program to be run: GEORES 5 x. SAS or GEOINS 5 x. SAS • 2. Identify your input file: – e. g. , change SAMPLDAT. CAN to yourdata. txt – Show in which column to find ID, FSA, LDU • 3. Name the two output files produced: – e. g. , yourdata. GEO, yourdata. PRB • 4. Submit the program: – Make sure your cursor is in the SAS program window – Save the program as modified, then click on the “run” icon (a runner) • 5. Review the results: – In SAS, browse the contents of the log and output (. lst) windows, – then use Notepad to examine the. GEO file produced.

Modify pathfilenames for your input and output files

Modify pathfilenames for your input and output files

Modify the INPUT statement to show SAS where to find your ID, FSA, LDU

Modify the INPUT statement to show SAS where to find your ID, FSA, LDU

Examine the top of the SAS. log file (was your file found and read

Examine the top of the SAS. log file (was your file found and read properly? )

Typical problem getting started: “file not found”

Typical problem getting started: “file not found”

Can you see why it failed?

Can you see why it failed?

Examine the SAS Output (. lst) Summary Most should be LINK (PROB) 5+

Examine the SAS Output (. lst) Summary Most should be LINK (PROB) 5+

Postal codes invalid or missing (LINK=0) – possible causes, etc. • Postal codes present,

Postal codes invalid or missing (LINK=0) – possible causes, etc. • Postal codes present, but not in columns specified (none will have exact match) • Some postal codes out of alignment (intermittent failure) • Easily corrected errors in syntax of postal code (l vs 1, O vs 0, etc. ) • Incomplete or missing postal codes (geographic codes will usually still be imputed)

Look at your input data file using Notepad. Columns should line up.

Look at your input data file using Notepad. Columns should line up.

LINK=0 : “postal code” never used (imputation still possible if partially correct)

LINK=0 : “postal code” never used (imputation still possible if partially correct)

Correcting the input data file to be geocoded: Look at the. csv or. txt

Correcting the input data file to be geocoded: Look at the. csv or. txt file using Notepad or Word rather than Excel

Look at the printout of the HLTHOUT dataset (. GEO file)--1 st 500 records

Look at the printout of the HLTHOUT dataset (. GEO file)--1 st 500 records If PCODE doesn’t look right (e. g. , if it begins with a number), the coding will fail.

Another way to scan your results: Open your filename. GEO file No column headings

Another way to scan your results: Open your filename. GEO file No column headings in this file, so see User Guide, Appendix A for the record layout. Or use FMT 5 x. GEO. DOC to see with column headers.

Look at the GEOPROB printout This summarizes the problems, grouped from most to least

Look at the GEOPROB printout This summarizes the problems, grouped from most to least serious (LINK=0 -4). You could also use Notepad to open the. PRB file (without column headers or error/warning/note)

LINK=0 Error: No match to PCCF (ever) • Correct if possible. Obvious syntax errors

LINK=0 Error: No match to PCCF (ever) • Correct if possible. Obvious syntax errors can be corrected automatically. Others will require supplemental data. • If your file only has the first three characters of the postal code (FSA), or only the first four or five characters, all of your records will have LINK=0. Nevertheless, full census geographic codes will be imputed within the known service area (unless invalid even at 3, 4 or 5 characters).

LINK=1 Error: Linked to PO geography • We only know where the post office

LINK=1 Error: Linked to PO geography • We only know where the post office is located, but not the location (or set of possible locations) of final destination (=> systematic bias towards central areas) • If possible, use full street address to find a postal code specific to the address range. • By default, only the PRCD and CMACA codes are kept.

LINK=2 Warning: Non-residential • We know the exact location, but this is not a

LINK=2 Warning: Non-residential • We know the exact location, but this is not a residential address. • Typically occurs when people want correspondence to go to their business address. • So we can only assume that the residence is somewhere in the commuting area (CMACA). • If full street address available, recode.

LINK=3 Business building (usually) • This type of postal code (DMT=E) is usually reserved

LINK=3 Business building (usually) • This type of postal code (DMT=E) is usually reserved for businesses. But for reasons unknown to us, Canada Post sometimes assigns it to a residential building, or to a mixed residential/commercial building. • If the building is only for commercial use, then this is the same serious problem as LINK=2 (non-residential).

Sample printout from the GEOPROB dataset GEOCODES/PCCF VERSION 4 PARTIAL PRINT OF GEOPROB FILE

Sample printout from the GEOPROB dataset GEOCODES/PCCF VERSION 4 PARTIAL PRINT OF GEOPROB FILE (ERRORS & WARNINGS, BUT NO NOTES) ID PCODE PRCDCSD CMA CT DABLK LL HRSUB DPL DIAG BLDG NAME, ADR(CPCOMM: CMA/DPL) : CDNAME CDTYP CSDNAME TY ------------------------------------------------------------------0 ERROR: NO MATCH TO PCCF---CHECK PCODE/ADDRESS &OR CODE MANUALLY ---------------------------------1202050810 A 1 X 5 J 7 1001485 001 301. 02 013501 4705 01 000 90 I 31994. St. John's CMA : Avalon Peninsul DIV CONCEPTIT* 1201026310 B 2 M 5 B 3 1200999 999900 4506 99 902. . 892. : * 1302025710 G 0 K 2 K 0 2410005 000. 00 007009 4806 01 000 90 I 949949 NOT CMACA : Rimouski-Neiget MRC ESPRIT-SM* 1301031010 H 9 G 3 X 9 2466140 462 521. 01 235801 4507 06 000 90 I 31994. Montréal CMA : Montréal CU DOLLARD-V* 1602451310 K 7 K 2 T 0 3510010 521 008. 00 018405 4407 0241 000 90 I 11994. Kingston CMA : Frontenac CTY KINGSTONC* 1604153110 M 3 Y 4 A 1 3520005 535 999. 99 999900 4307 99999 902. . 892. Toronto CMA : Toronto DIV TORONTO C* 1604305110 R 3 N 3 L 2 4611040 602 008. 00 038001 4909 10 000 90 I 11994. Winnipeg CMA : Winnipeg DIV WINNIPEGC* 1802106710 V 1 S 4 X 1 5933042 925 006. 00 004302 5012 14 000 90 I 21994. Kamloops CA 1 : Thompson-Nicola RD KAMLOOPSC* 1802068310 V 4 T 4 J 5 5935027 915 102. 02 015502 4911 13 175 90 I 41994. Kelowna CA 1: Westbank (UNP) : Central Okanaga RD CENTRAL RD 1803049810 V 9 C 5 T 3 5917044 935 154. 02 048004 4812 41 000 90 I 51994. Victoria CMA : Capital RD LANGFORDDM ---------------------------------1 ERROR: LINKED TO PO GEOG--CODE MANUALLY IF RESID ADD AVAILABLE ---------------------------------1604055531 R 4 J 1 A 1 4611999 602 999. 99 999900 4909 99 000 JZ 1 I 22824. HEADINGLEY: Winnipeg CMA : Winnipeg DIV * 1201059710 A 1 X 4 G 9 1001999 001 999. 99 999900 4705 99 000 K 1 I 318341 BOX 18001: 18060 STN MAIN UPPER GULLIES * ---------------------------------2 WARNING: NON-RESIDENTIAL PCODE--CHECK PCODE/ADDRESS (LEGIT RES? ) ---------------------------------1304154932 H 3 L 1 B 9 -2400999 462 999. 99 999900. . 99 999 E 2 F 119191 CENTRE MEDICAL HENRI-BOURASSA 222 HENRI-BOURA MONT * 1603422510 L 4 C 9 S 7 -3500999 535 999. 99 999900. . 99999 E 2 F 119191 BUSINESS BUILDING 120 NEWKIRK RD RICHMOND HILL * 1602226510 T 2 S 2 T 6 -4800999 825 999. 99 999900. . 99 999 E 2 F 119191 FOODVALE OFFICE COMPLEX 5005 ELBOW DR SW CALGARY * 1601088310 T 5 N 4 A 3 -4800999 835 999. 99 999900. . 99 999 E 2 F 119191 PEOPLES TRUST PLAZA 10216 124 ST NW EDMONTON * 1302161110 H 3 N 2 Y 1 -2400999 462 999. 99 999900. . 99 999 G 2 F 119191 VIDEOTRON LTEE 405 OGILVY AV 200 MONTREAL * 1804030033 V 2 A 5 A 9 -5900999 913 000. 00 999900. . 99 999 G 2 D 119171 CITY OF PENTICTON 171 MAIN ST PENTICTON * ---------------------------------3 WARNING: BUSINESS BLDG----CHECK PCODE/ADDRESS (LEGITIMATE RES? ) ---------------------------------1604118533 L 6 Y 2 N 4@3521010 535 572. 05 020201 4307 0653 000 E 3 F 111191 APARTMENT BLDG 430 MCMURCHY AVE S BRAMPTONC* 1604503732 T 5 H 4 B 9@4811061 835 046. 00 020808 5311 25 000 E 3 F 111191 HYS MEDICAL CENTRE 11010 101 ST NW EDMONTONC* ---------------------------------4 WARNING: COMMERC/INSTITU--CHECK PCODE/ADDRESS (LEGITIMATE RES? ) ---------------------------------1801082533 V 5 G 4 J 3? 5915025 933 230. 01 139201 4912 22 000 BG 4 F 111191 BRITISH COLUMBIA INSTITUTE OF TECHNOLOGY 4200 BURNABY C* 1202190833 A 1 B 1 S 5@1001519 001 013. 00 025301 4705 01 000 G 4 F 111191 ST PATRICKS MERCY HOME 146 ELIZABETH AVE ST. JOHN' ST. JOHNC* 1202154133 A 2 A 2 E 1@1006017 010 000. 00 003010 4805 03 000 G 4 D 112171 CENTRAL NEWFOUNDLAND REGIONAL HEALTH CENTRE 5 GRAND FAT* 1303089633 H 2 C 3 H 6@2466025 462 277. 00 265801 4507 06 000 G 4 F 111191 LES RESIDENCES LAURENDEAU, LEGARE, LOUVAIN 1725 MONTRÉALV* 1603169333 M 1 H 3 A 1@3520005 535 356. 00 361001 4307 0495 N 000 G 4 F 111191 CEDARBROOK LODGE 520 MARKHAM RD SCARBOROUGH TORONTO C* 1602154410 M 9 W 4 L 3@3520005 535 246. 00 184101 4307 0495 A 000 G 4 F 111191 KIPLING ACRES HOME FOR THE AGED 2233 KIPLING ETOBI TORONTO C* 1604515931 N 2 L 3 G 1@3530016 541 106. 01 029605 4308 0765 000 G 4 F 111191 UNIVERSITY OF WATERLOO 200 UNIVERSITY AVE W WATERLOOC* 1604443433 R 1 N 3 V 4@4609029 607 000. 00 001414 H 4909 40 000 G 4 F 112181 LION'S PRAIRIE MANOR 24 9 TH ST SE PORTAGE LA PRAIR PORTAGE C* 1603468632 R 3 N 1 V 9@4611040 602 510. 02 036601 4909 10 000 G 4 F 111191 CANADIAN FORCES BASE WINNIPEG, KAPYONG BARRAC WINNIPEGC* 1601086332 R 7 N 1 R 7@4617050 000. 00 001114 5110 60 000 G 4 F 111191 DAUPHIN GENERAL HOSPITAL 625 3 RD ST SW DAUPHIN C* 1603548732 S 4 S 3 B 4@4706027 705 002. 02 049002 5010 04 000 G 4 F 111191 EXTENDICARE/PARKSIDE 4540 RAE ST REGINA C* 1602539533 T 5 K 0 L 4@4811061 835 032. 02 015604 H 5311 25 000 G 4 F 111191 GENERAL HOSPITAL 11111 JASPER AVE NW EDMONTONC* 1803100131 V 6 T 1 K 2@5915020 933 069. 00 094705 4912 32 000 G 4 D 111171 WALTER GAGE RESIDENCE ( UBC ) 5959 STUDENT UN VANC GREATER RD ------------------------------------------------------------------

LINK=4 Warning: Commercial/Institutional • Here we usually know the exact location, but it may

LINK=4 Warning: Commercial/Institutional • Here we usually know the exact location, but it may or may not be pertinent to your study. Look at the building name shown in the problem file. • For example, young people typically don’t live in nursing homes (but many old people do); new mothers don’t live at the hospitals where they give birth (but old people may live in a chronic care ward of a hospital). • Sometimes problems are readily apparent from the building name and address. But what you should do will depend on the aims of your study.

LINK=5 or higher Noted only: Not a problem • You can safely ignore these

LINK=5 or higher Noted only: Not a problem • You can safely ignore these notes (5, 6, 7) • Retired postal codes (5) are often found on administrative files, and are not a problem. • Multiple possible matches (6, 7) are common (≈ 25%), and are usually coded using population weights (7). Those not coded using population weights (6) are urban postal codes, usually serving a very small area.

BC Historic Files Only (before Apr 1999)

BC Historic Files Only (before Apr 1999)

DMT & SOURCE See PCCF+ User Guide Appendix C for explanation of codes used.

DMT & SOURCE See PCCF+ User Guide Appendix C for explanation of codes used.

RESFLG: blank=OK - non-residential; @ OK despite DMT

RESFLG: blank=OK - non-residential; @ OK despite DMT

NCDS – fyi only So 20% of the file contained postal codes serving 2

NCDS – fyi only So 20% of the file contained postal codes serving 2 or more CSDs (only 1 of which could have been coded using the SLI).

CSIZE, QAIPPE, IMMTER (etc. ) Some neighbourhood characteristics of the individuals in your file.

CSIZE, QAIPPE, IMMTER (etc. ) Some neighbourhood characteristics of the individuals in your file.

The Campus List file (input data file for GEOINS 5 J. SAS)

The Campus List file (input data file for GEOINS 5 J. SAS)

GEOINS 5 J. SAS – showing setup

GEOINS 5 J. SAS – showing setup

IJ 5 -checking the SAS log

IJ 5 -checking the SAS log

IJ 5 – summary of geocoding results

IJ 5 – summary of geocoding results

IJ 5 Problem file

IJ 5 Problem file

IJ 5 Output (for Campus. List)

IJ 5 Output (for Campus. List)

Supplemental programs in PCCF+ • Geocoding old records from BC (where FSAs moved) –

Supplemental programs in PCCF+ • Geocoding old records from BC (where FSAs moved) – RJ 5 x. OLD. SAS - for residential coding – IJ 5 x. OLD. SAS – for institutional coding • Other programs (see Appendix N) – FIXPCBAD. SAS – fixes common errors in PCODE syntax – EXPLODE 2. SAS – if your data file contains counts for each postal code, rather than separate records for each – HOUTDLM. SAS – tab-delimited output for Excel, etc. – DIST 5 x. SAS – distance to closest of many other records

Russell Wilkins • • Health Analysis Division Statistics Canada, RHC-24 A 100 Tunney’s Pasture

Russell Wilkins • • Health Analysis Division Statistics Canada, RHC-24 A 100 Tunney’s Pasture Driveway Ottawa ON K 1 A OT 6 • Tel: 1 -613 -951 -5305 • Fax: 1 -613 -951 -3959 • Email: russell. wilkins@statcan. gc. ca