Evolution of SPSS Layout syntax and change Layout


























































- Slides: 58
Evolution of SPSS: Layout, syntax and change
Layout
It’s back to the 80 -column card
Key to layout of Hollerith card ~ ~ ~
This determined layout of early SPSS setup files • Columns 1 to 15 were reserved for commands • Columns 16 to 72 were reserved for sub-commands and specifications • Columns 73 to 80 were for numbering the cards • Commands had to start in column 1 • Sub-commands and specifications could start in or after column 16 • Continuation lines had to start in or after column 16, but variable names could not be wrapped.
Raw data (including multi-punches) from 80 -column card 001110204+57462235696172244322232422 - 2 O- 322 K 2 - 3$62$$5 05902 -- 89564$-147321 0012$$$% 1 23 0 19$0$78$$6110$Q 31111010 23463110 4113+2211207637321 002119051 -44689428858 -45242524431442324 T 31$3823+84$8354$77 158 -5 -7 M$6$O 6$$417321 0022$$$$ 2 1 3 1$1$$$$22 F$11222 -4101001102211310002220107637321 003114202+355 -953273 --3324454341415591+N 91238 -2+8257$$55+- $- 4 -7$$5$$5$2137321 0032$$$$ 1 32 0 12$$$26 N$11222$51111011012122010 310122215127637321 (SSRC Quality of Life: 1 st Pilot Survey 1971 2 cards per case, first 3 cases only, multipunches in red ) UK Data Archive study 247. The survey was conducted March – May 1971, but SPSS files were not created until 1972 -73.
Spread out multi-punching: first case only 001110204+57462235696172244322232422 - 2 O- 322 K 2 - 3$62$$5 05902 -- 89564$-147321 0012$$$% 1 23 0 19$0$78$$6110$Q 31111010 23463110 4113+2211207637321 001390000001010000900000010010100011010000900000000101001007321 0014000100101100000101000000010000000010100000090000017321 001500000001001000100000100100100000111122222221290000017321 00160000101110000000101 7321 …done with LSE program MUTOS
Standard 80 -column data preparation sheet modified for SPSS use at SSRC Survey Unit
These restrictions were later lifted but it is still helpful for beginners (or even veterans) to retain these distinctions visually by using tabs to inset sub-commands and specifications
Syntax
Some changes to syntax since 1972 VARxxx TO VARyyy Vx to Vy Qx to Qy etc. Labels allowed in UPPER CASE only Any printing characters in primes Limits to no. of characters in labels (40 for variables) (20 for values) Removed, theoretically 255, but printout constraints apply VARIABLE LIST INPUT FORMAT INPUT MEDIUM DATA LIST FILE = RECORDS = BREAKDOWN MEANS
Effects of changes Many setup jobs from the 1970 s and 1980 s will no longer work 1 Fortran format statements have been replaced by data list 2 Much data was received in multipunched format, and had to be read as alphabetic, but data can’t be recoded into same variables any more
Data input and transformation
Variable Names • Had to be in upper case in form VARddd eg VAR 001 TO VAR 010 • Later changed to any upper case letter(s) and any digit(s) eg VAR 1 TO VAR 10 or Q 1 to Q 10 • Later still, lower case letters allowed: eg • q 1 to q 10, but print format is still in upper case • Still can’t do any letter(s) and any letter(s) eg q 1 a to q 1 g
Mnemonic variable names • Demonic more like! • Names look like what they represent and help you to remember them • We shall see! • sex age income are self-evident • but what about idstrng = "strength of identity with political party supported? "
Positional variable names • First digit defines card (record) • 2 nd pair of digits defines start column • VAR 311 is not the 311 th variable, but the variable which starts on record 3 column 11 (field width is determined by the format statement)
Read in data in alpha format: 1973 RUN NAME QL 1 UK 1 - PILOT 1 FIRST SYSTEM FILE NAME QL 1 UK 1 QUALITY OF LIFE PILOT I UK VARIABLE LIST VAR 101 VAR 105 VAR 109 TO VAR 137 VAR 141, VAR 144, VAR 145, VAR 148 VAR 149 VAR 152 VAR 155 VAR 158 VAR 159 VAR 162 VAR 165 VAR 166 VAR 169 VAR 172 VAR 175, VAR 176, VAR 209 TO VAR 223 VAR 225 VAR 230 VAR 234 TO VAR 237 VAR 240 TO VAR 256 VAR 263 VAR 264 VAR 266 TO VAR 268 VAR 270 INPUT MEDIUM INDATA INPUT FORMAT FIXED (F 3. 0, 1 X, A 4, F 1. 0, 13 A 1, 14 F 1. 0, A 1, 3 X, A 1, 2 X, F 1. 0, A 1, 2 X, 2 A 1, 2 X, A 1, 2 X, 2 A 1, 2 X, 2 A 1, 4 X/ 8 X, 15 A 1, 1 X, 1 A 1, 4 X, A 2, 2 X, A 1, F 1. 0, 2 A 1, 2 X, 17 A 1, 6 X, A 1, A 2, 2 A 1, A 2, A 4) NO. OF CASES 213
Converting alpha to numeric: 1973 RECODE VAR 105 ('++++'=9999) (CONVERT)/ VAR 110 ('+'=2)('-'=1)('0'=88) (CONVERT)/ VAR 111 TO VAR 122 VAR 137 VAR 141 VAR 145 VAR 149 VAR 152 VAR 155 VAR 158 VAR 162 VAR 166 VAR 169 VAR 172 ('-'=10)('+'=99) (CONVERT)/ VAR 144 (1=2)/ VAR 148 VAR 165 ('+'=1) ('-'=2) (CONVERT)/ VAR 159 (' '=1) ('-'=0) (CONVERT)/ VAR 175 ('+' ' '=88) ('4'=3) (CONVERT)/ VAR 176 (' ', '+'=99) (CONVERT)
Recode of alpha to numeric format 1973
This doesn’t work any more You have to use dummy variables and then RECODE < dummy varlist> (<old value list> = <new value>) (CONVERT) into <new varlist> Now that's syntax!
Data List for dummy variables (alphanumeric data) 2002 data list file ‘f: qluk 1. dat’ records 6 /1 serial 1 -3 v 105 to v 180 5 -80 (a) /2 v 209 to v 280 9 -80 (a).
Output from Data List will read 6 records from the command file Variable Rec Start End Format SERIAL 1 1 3 F 3. 0 V 105 1 5 5 A 1 V 106 1 6 6 A 1 V 107 1 7 7 A 1 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ V 278 2 78 78 A 1 V 279 2 79 79 A 1 V 280 2 80 80 A 1
When I first used SPSS for Windows My data files were in another directory (oops! folder) and I couldn’t get SPSS to find them. Small data files were placed on dsk: a, eg ‘a: fifth. dat’, but files larger than 1. 4 mb presented problems. I got round it by opening the raw data file, dragging the data into the setup file and then bracketing the data with BEGIN DATA and END DATA , but it took ages to wait for the copy I did this with huge files for 3 years, until a friend gave me a memory stick and I could use dsk: f eg ‘f: ess 2002. dat’ Now I also use a plug-in rewriter and back up on CD.
Embedded data: begin data & end data
Read in data in alpha format: 2002
Recode dummy string variables into numeric variables RECODE V 209 TO V 222 (' ', '+', '-'=0) ('1'=2) ('2'=1) ('3'=-2) (CONVERT) into VAR 209 TO var 218 xvar 219 var 220 to VAR 222 / V 223 V 225 V 234 ('+'=88) (CONVERT) into VAR 223 VAR 225 VAR 234 / V 230 ('99'=98) ('++'=99) (CONVERT) into var 230/ V 236 V 237 ('+'=99) (CONVERT) into var 236 var 237/ V 240 ('+'=88) ('1' '2'=3)(CONVERT) into xvar 240/ V 241 TO V 252(' '=88)(CONVERT) into VAR 241 TO VAR 252/ V 253 (' '=88) ('4' '5'=3)(CONVERT) into var 253/ V 254 TO V 256(' '=88)(CONVERT) into VAR 254 TO VAR 256. RECODE V 263 ('+'=88) (CONVERT) into var 263 / V 264 ('++'=88) (CONVERT) into var 264 / V 266 V 267 ('+'=1) ('-'=2) ('0'=3) ('1'=4) ('2'=5) ('3'=6) ('4'=7) ('5'=8) ('6'=9) ('7'=99) (' '=99)(CONVERT) into var 266 var 267 /V 268 (CONVERT) into var 268 / V 270 ('++++'=88) (CONVERT) into var 270.
Variable Labels
Variable labels: 1973 (SSRC Quality of Life Survey 1 st Pilot 1971)
Variable labels: 1973 (SSRC Quality of Life Survey 1973) Note change of format mid-setup!
Variable labels: 1981 (Fifth form survey in North London)
Variable labels 1989 (NUS Student Finance Survey 1989)
Value Labels
Value Labels 1973 • • • UPPER CASE only VALUE LABELS in cols 1 -16 Values in round brackets, no primes needed 20 characters for rows 16 characters for columns (in 2 blocks of 8) • Tortuous spellings and abbreviations • Formatted with packing spaces
Value Labels 1973 (Attitudes and Opinions of Senior Girls: St Trinian’s) VALUE LABELS FORM(1)LOWER FIFTH(2)UPPER FIFTH(3)LOWER SIXTH (4)UPPER SIXTH /YEARBORN(1)1954(2)1955(3)1956(4)1957(5)1958 /MONTH(1)JANUARY(2)FEBRUARY(3)MARCH(4)APRIL(5)MAY(6)JUNE(7)JULY (8)AUGUST(9)SEPTEMBR(10)OCTOBER(11)NOVEMBER(12)DECEMBER /VAR 111 TO VAR 119(1)MOST IMPORTNT(2)NEITHER(3)LEAST IMPORTNT /JOB 1 TO JOBAT 25(1)ACCNTNCY, FINANCE(2)ARCHIT- ECTURE (3)CIVIL ENGINEER(4)CREATIVE ARTIST(5)DOCTOR, DENTIST (6)FASHION(7)GOVNMNT, ADMIN. (8)HOUSE -WIFE(9)INDUST. TECH. (10)JOURN- ALISM(11)MILITARY SERVICE(12)NURSING (13)OUTDOOR, ATHLETIC(14)OWN BUSINESS(15)PERFORM-ING ARTS (16)PERSONN-EL MNGMT(17)POLITICS(18)PUBLISH -ING (19)SALES + MARKETNG(20)SCIENCE-MATHS(21)SCIENCE-BIOLOGY (22)SCIENCE-SOCIAL(23)SECRET -ARY(24)SOCIAL WORK (25)SOLICTR, BARRISTR(26)TEACHER-PRIMARY(27)TEACHER-SECNDARY (28)TOWN PLANNING(29)TV, FILM PRODUCER(30)UNIVSTY LECTURER (31)LIBRAR -IAN(32)PUBLIC RELATNS(33)COMP- UTERS(34)OTHER
Value Labels 2002 St Trinian’s (twice modified) Before /JOB 1 TO JOBAT 25 (1) ACCNTNCY, FINANCE (2) ARCHIT- ECTURE (3) CIVIL ENGINEER (4) CREATIVE ARTIST (5) DOCTOR, DENTIST (6) FASHION (7) GOVNMNT, ADMIN. (8) HOUSE -WIFE (9) INDUST. TECH. (10) JOURN- ALISM After /job 1 to jobat 25 (1) Accntncy, finance (2) Archit- ecture (3) Civil engineer (4) Creative artist (5) Doctor, dentist (6) Fashion (7) Govnmnt, admin. (8) House -wife (9) Indust. tech. (10) Journ- alism
Value Labels 1973 (SSRC Quality of Life: 1 st pilot survey 1971) VALUE LABELS VAR 109 (1) LOT MORE (2) LITTLE MORE (3) SAME (4) LITTLE LESS (5) LOT LESS /VAR 110 (1) FORWARDS (2) BACKWRDS /VAR 123 (1) UNSKILLDMAN WKRS (2) SKILLDMAN WKRS (3) OFFICE WORKERS (4) PROFES- SIONAL (5) COMPANY DIRECTRS (6) SHOP KPRS ETC (7) OAP'S (8) INVESTRS ETC (9) NOT KNOWN
Output formats (before Windows)
FREQUENCY COUNT WITH LABELS (1973) AGEGRP: Age group Relative Adjusted Cum Absolute freq Code freq ( % ) 17 -29 1. 206 22. 1 22. 4 30 -44 2. 214 23. 0 23. 3 45. 8 45 -59 3. 242 26. 0 26. 4 72. 1 60+ 4. 256 27. 5 27. 9 100. 0 99. 14 1. 5 Missing 100. 0 ------ Total 932 100. 0 Valid cases 918 Missing cases 14
HISTOGRAM PLOT (with optional statistics) 1973 VAR 147 SATISFACTION WITH WHOLE LIFE Code I 1 ** ( 1) I 2 ** ( 1) I 3 ** ( 2) I 4 ****** ( 9) I 5 ***** ( 18) I 6 ***** ( 17) I 7 ********** ( 35) I 8 **************** ( 60) I 9 ********* ( 34) I 10 ********* ( 33) I. . . . . I 0 20 40 60 80 100 Frequency Mean 7. 610 Median 7. 867 Std dev 1. 801 Valid cases 210 Missing cases 0
CONDENSED FORMAT FREQUENCY COUNT (not available in Windows) AGE OF R IN COMPLETE YEARS Adj Cum Code Freq % % 18 15 2 2 42 14 2 42 66 14 2 83 19 16 2 3 43 14 2 44 67 20 2 85 20 19 2 5 44 19 2 46 68 12 1 87 21 17 2 7 45 11 1 47 69 18 2 89 22 19 2 9 46 15 2 49 70 13 1 90 23 16 2 11 47 14 2 50 71 8 1 91 24 16 2 13 48 17 2 52 72 8 1 92 25 14 2 14 49 15 2 54 73 12 1 93 26 19 2 16 50 24 3 56 74 9 1 94 27 25 3 19 51 16 2 58 75 8 1 95 28 13 1 21 52 15 2 60 76 7 1 96 29 16 2 22 53 19 2 62 77 7 1 97 30 13 1 24 54 14 2 63 78 6 1 97 31 13 1 25 55 15 2 65 79 4 0 98 32 24 3 28 56 13 1 66 80 5 1 98 33 7 1 29 57 19 2 68 81 3 0 98 34 19 2 31 58 16 2 70 82 6 1 99 35 13 1 32 59 19 2 72 83 2 0 99 36 7 1 33 60 10 1 73 85 1 0 99 37 12 1 34 61 15 2 75 86 1 0 100 38 14 2 36 62 17 2 77 87 1 0 100 39 13 1 37 63 14 2 78 88 2 0 100 40 15 2 39 64 17 2 80 90 1 0 100 41 17 2 41 65 15 2 82 M i s s i n g d a t a Code Freq Wild 15
CONTINGENCY TABLE WITH LABELS SEX OF RESPONDENT by HAPPY HOW HAPPY IS R? HAPPY Count : Row % : NOT TOO PRETTY VERY Row : HAPPY Total : 1 : 2 : 3 : SEX --------: 1 : 24 : 230 : 131 : 385 MEN : 6. 2 : 59. 7 : 34. 0 : 41. 6 -: --------: 2 : 33 : 286 : 222 : 541 WOMEN : 6. 1 : 52. 9 : 41. 0 : 58. 4 -: --------: Column 57 516 353 926 Total 6. 2 55. 7 38. 1 100. 0 Number of missing observations = 6
CONTINGENCY TABLE WITH ALL PERCENTAGES SEX OF RESPONDENT by AGEGROUPED OF R AGEGROUP Count : Row % : 17 -29 30 -44 45 -59 60+ Row Col % : Total Tot % : 1 : 2 : 3 : 4 : SEX --------: ----: 1 : 88 : 90 : 110 : 92 : 380 MEN : 23. 2 : 23. 7 : 28. 9 : 24. 2 : 41. 4 : 42. 7 : 42. 1 : 45. 5 : 35. 9 : : 9. 6 : 9. 8 : 12. 0 : 10. 0 : -: --------: 2 : 118 : 124 : 132 : 164 : 538 WOMEN : 21. 9 : 23. 0 : 24. 5 : 30. 5 : 58. 6 : 57. 3 : 57. 9 : 54. 5 : 64. 1 : : 12. 9 : 13. 5 : 14. 4 : 17. 9 : -: --------: Column 206 214 242 256 918 Total 22. 4 23. 3 26. 4 27. 9 100. 0 Number of missing observations = 14 (NB: Extensive use of this format in analysis is usually a sign of inexperience and anxiety in researchers (or their supervisors) who are either too proud to ask for advice and assistance or who are possibly even completely incompetent. It is also a waste of paper, time and money!)
MEANS SEXISM BY SEXISM Q 33 Sexism score SEX Sex of respondent Variable Value Label Mean Cases For Entire Population 2. 8810 86 SEX 1 Boys 3. 9729 42 SEX 2 Girls 1. 8389 44 Total Cases = 86
MEANS SEXISM BY SEX BY ETHNIC SEX Sex of respondent Variable Value Label Mean Cases For Entire Population 2. 8810 86 ETHNIC 1 White 3. 2600 38 SEX 1 Boys 4. 6300 19 SEX 2 Girls 1. 8900 19 ETHNIC 2 Black 2. 5810 48 SEX 1 Boys 3. 4300 23 SEX 2 Girls 1. 8000 25 Total cases = 86
CROSSBREAK (not available in Windows) MEANS VARIABLES = SEXISM(0, 9) V 348(1, 2) ETHNIC(1, 2) /CROSSBREAK = SEXISM BY V 348 BY ETHNIC /CELLS = MEAN COUNT
CROSSBREAK output (not available in Windows) ETHNIC Mean : Count : White Black Row : Total : 1 : 2 : SEX --------: 1 : 4. 63 : 3. 43 : 3. 98 Boys : 19 : 23 : 42 -: ----------: 2 : 1. 89 : 1. 80 : 1. 84 Girls : 19 : 25 : 44 -: ----------: Column Total 3. 26 2. 58 2. 88 38 48 86
Crafty use of Crossbreak RECODE MEANS SEXISM (2 THRU 7 = 100) (0, 1 = 0) (ELSE = SYSMIS) VARIABLES = SEXISM (0, 100) V 348 (1, 2) ETHNIC (1, 2) /CROSSBREAK = SEXISM BY V 348 BY ETHNIC /CELLS = MEAN COUNT
Crafty use of CROSSBREAK: output ETHNIC Mean : Count : White Other Row : Total : 1 : 2 : V 348 --------: 1 : 100. 00 : 82. 61 : 90. 48 Boys : 19 : 23 : 42 -: ----------: 2 : 47. 37 : 44. 00 : 45. 45 Girls : 19 : 25 : 44 -: ----------: Column Total 73. 68 62. 50 67. 44 38 48 86 Number of missing observations = 56 Used with RECODE: cells are % "sexist" and base n
Back to variable names
Data List with mnemonic variable names (British Social Attitudes 1987: Curtice) /2 VERSION 8 READPAP 9 WHPAPER 10 -11 SUPPARTY CLOSEPTY 12 -13 PARTYID 1 1415 IDSTRNG CNTLCNCL RATES RENTS EEC NATO NATION USANUKE OWNNUKE UKNUCPOL DEFPARTY PEACE NIRELAND TROOPOUT 16 -29 HINCDIFF HINCPAST HINCXPCT 31 -55 RECONACT 56 -57 RFTEDUC RTRAING RPAIDWRK RWAITWRK RREGISTD RSEEKWRK RNTLOOK RSICK RRETIRD RATHOME RELSE REMPLOYE 58 -69 EJBHOURS 70 -71 EJBHRCAT WAGENOW PAYGAP WAGEXPCT NUMEMP EMSMEWRK EMSEXWRK EMWOMCLD EMWOMWLD 72 -80
Variable Labels (British Social Attitudes 1987: Curtice) VARIABLE LABELS VERSION QUESTIONNAIRE VERSION ADMINISTERED/ READPAP Q 1 A R READS NEWSPAPER 3+ TIMES PER WEEK/ WHPAPER Q 1 B [IF READS 3+ TIMES] WHICH PAPER/ SUPPARTY Q 2 A POLITICAL PARTY SUPPORTER/ CLOSEPTY Q 2 C [IF NOT SUPORTR] CLOSER TO ONE PARTY/ PARTYID 1 Q 2 B & 2 D & 2 E PARTY IDENTIFICATION[FULL]/ IDSTRNG Q 2 F HOW STRONG PARTY IDENTIFICATION/
Data List with positional variable names (British Social Attitudes 1987: Hall) /2 version 8 v 209 9 v 210 10 -11 v 212 v 213 12 -13 v 214 14 -15 v 216 to v 229 16 -29 v 231 to v 255 31 -55 v 256 56 -57 v 258 to v 269 58 -69 v 270 70 -71 v 272 to v 280 72 -80
Output from data list (BSA 1987: Hall) Data List will read 23 records from F: bsa 87. dat Variable Rec Start End Format VERSION 2 8 8 F 1. 0 V 209 2 9 9 F 1. 0 V 210 2 10 11 F 2. 0 V 212 2 12 12 F 1. 0 V 213 2 13 13 F 1. 0 V 214 2 14 15 F 2. 0 V 216 2 16 16 F 1. 0 V 217 2 17 17 F 1. 0 V 218 2 18 18 F 1. 0 V 219 2 19 19 F 1. 0 V 220 2 20 20 F 1. 0 V 221 2 21 21 F 1. 0 V 222 2 22 22 F 1. 0 V 223 2 23 23 F 1. 0 V 224 2 24 24 F 1. 0
Modified Variable Labels (British Social Attitudes 1987: Hall) variable labels version /readpap /whpaper /supparty /closepty /partyid 1 /idstrng ‘Questionnaire version administered’ ‘Q 1 a R reads newspaper 3+ times per week’ ‘Q 1 b [if reads 3+ times] which paper’ ‘Q 2 a Political party supporter’ ‘Q 2 c [if not suportr] closer to one party’ ‘Q 2 b & 2 d & 2 e party identification [full]’ ‘Q 2 f How strong party identification’
Renaming variables (I cheated!) rename variables (readpap to idstrng = v 209, v 210, v 212 to v 214, v 216). … and if you’re worried about putting things back as they were: rename variables (v 209, v 210, v 212 to v 214, v 216 = readpap whpap supparty closepty partyid 1 idstrng). NB: The original variable list has to be present in full to restore mnemonic from positional names
Modified Variable Names and Labels (British Social Attitudes 1987: Hall) variable labels version ‘Questionnaire version administered’ /v 209 ‘Q 1 a R reads newspaper 3+ times per week’ /v 210 ‘Q 1 b [if reads 3+ times] which paper’ /v 212 ‘Q 2 a Political party supporter’ /v 213 ‘Q 2 c [if not suportr] closer to one party’ /v 214 ‘Q 2 b & 2 d & 2 e party identification [full]’ /v 216 ‘Q 2 f How strong party identification’
Other developments in SPSS Blue manual in A-Z order of commands Norusis (1988) in user-friendly research process order, but for SPSS 13 ? ? ? Batch only Interactive Mainframe only SPSS PC+, then SPSS for Windows
…and another change, of which more later