DATA MANAGEMENT Using Epi Data and SPSS References

  • Slides: 31
Download presentation
DATA MANAGEMENT Using Epi. Data and SPSS

DATA MANAGEMENT Using Epi. Data and SPSS

References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management

References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using Epi. Data. The Epi. Data Documentation Project. : http: //www. epidata. dk/downloads/dmepidata. pdf Epi. Data Association Website: http: //www. epidata. dk/ Importing raw data into SPSS: http: //www. ats. ucla. edu/stat/spss/modules/input. ht m

Data Management • • Planning data needs Data collection Data entry and control Validation

Data Management • • Planning data needs Data collection Data entry and control Validation and checking Data cleaning and variable transformation Data backup and storage System documentation Other

Types of Data Base Management Systems (DBMSs) • Spreadsheets (e. g. , Excel, SPSS

Types of Data Base Management Systems (DBMSs) • Spreadsheets (e. g. , Excel, SPSS Data Editor) • • • Prone to error, data corruption, & mismanagement Lack data controls, limited programmability Suitable only for small and didactic projects Also good for last step data cleaning Commercial DBMS programs (e. g. , Oracle, Access) • Limited data control, good programmability • Slow & expensive • Powerful and widely available • Public domain programs (e. g. , Epi. Data, Epi Info) • Controlled data entry, good programmability • Suitable for research and field use

We will use two platforms: • Epi. Data • controlled data entry • data

We will use two platforms: • Epi. Data • controlled data entry • data documentation • export (“write”) data • SPSS • • • import (“read”) data analysis reporting

What is Epi. Data ? • • • Epi. Data is computer program (small

What is Epi. Data ? • • • Epi. Data is computer program (small in size 1. 2 Mb) for simple or programmed data entry and data documentation It is highly reliable It runs on Windows computers • Runs on Macs and Linus with emulator software (only) • Interface • pull down menus • work bar

History of Epi. Info & Epi. Data • 1976– 1995: Epi. Info (DOS program)

History of Epi. Info & Epi. Data • 1976– 1995: Epi. Info (DOS program) created by CDC (in wake of swine flu epidemic) • Small, fast, reliable, 100, 000+ users worldwide • • 1995– 2000: DOS dies slow painful death 2000: CDC releases Epi. Info 2000 • Based on Microsoft Jet (Access) data engine • Large, slow, unreliable (resembled Epi. Info in name only) • 2001: Loyal Epi. Info user group decides it needs real “Epi. Info for Windows” • Creates open source public domain program • Calls program “Epi. Data”

Goal: Create & Maintain Error. Free Datasets • Two types of data errors •

Goal: Create & Maintain Error. Free Datasets • Two types of data errors • Measurement error (i. e. , information bias) – discussed last couple of weeks • Processing errors = errors that occur during data handling – discussed this week • Examples of data processing errors • • • Transpositions (91 instead of 19) Copying errors (O instead of 0) Additional processing errors described on p. 18. 2

Avoiding Data Processing Errors • • • Manual checks (e. g. , handwriting legibility)

Avoiding Data Processing Errors • • • Manual checks (e. g. , handwriting legibility) Range and consistency checks* (e. g. , do not allow hysterectomy dates for men) Double entry and validation* • Operator 1 enters data • Operator 2 enters data in separate file • Check files for inconsistencies • Screening during analysis (e. g. , look for outliers) * covered in lab

Controlled Data Entry • • Criteria for accepting & rejecting data Types of data

Controlled Data Entry • • Criteria for accepting & rejecting data Types of data controls • • Range checks (e. g. , restrict AGE to reasonable range) Value labels (e. g. , SEX: 1 = male, 2 = female) Jumps (e. g. , if “male, ” jump to Q 8) Consistency checks (e. g. , if “sex = male, ” do not allow “hysterectomy = yes”) • Must enters • etc.

Data Processing Steps 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. File

Data Processing Steps 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. File naming conventions Variables types and names QES (questionnaire) development Convert. QES file to. REC (record) file Add. CHK file Enter data in REC file Validate data (double entry procedure) Documentation data (code book) Export data to SPSS Import data into SPSS

Filenaming and File Management • c: pathfilename. ext A web address is a good

Filenaming and File Management • c: pathfilename. ext A web address is a good example of a filename, e. g. , • Some systems are case sensitive (Unix) • http: //www 2. sjsu. edu/faculty/gerstman/Stat. Primer/data. ppt • Others are not (Windows) • Always be aware of • Physical location (local, removable, network) • Path (folders and subfolders) • Filename (proper) • Extension • Demo Windows Network Explorer: right-click Start Bar > Explore

File extensions you should know Extension Software program . qes Epi. Info/Epi. Data questionnaire

File extensions you should know Extension Software program . qes Epi. Info/Epi. Data questionnaire . rec Epi. Info/Epi. Data records (data) . chk Epi. Info/Epi. Data check (controls & labels) . not Epi. Data notes (data documentation) . sav SPSS permanent data file . sps SPSS syntax file (program) . txt Generic (flat) text data . htm Web Browser . doc Microsoft Word . xls Microsoft Excel

Selected Epi. Data Variable Types Variable Type Text Numeric Date Auto ID Sondex (sanitized)

Selected Epi. Data Variable Types Variable Type Text Numeric Date Auto ID Sondex (sanitized) Examples _ <A > # ##. # <mm/dd/yyyy> <dd/mm/yyyy> <IDNUM> <S >

Epi. Data Variable Names • • • Variable name based on text that occurs

Epi. Data Variable Names • • • Variable name based on text that occurs before variable type indicator code Epi. Data variable naming default vary depending on installation Create variable names exactly as specified To be safe, denote variable names in {curly brackets} • For example, to create a two byte numeric variable called age, use the question: What is your {age}? ##

Demo / Work Along • • • Create QES file [demo. qes] Convert QES

Demo / Work Along • • • Create QES file [demo. qes] Convert QES to REC [demo. rec] Create CHK file [demo. chk] Create double entry file [demo 2. rec] Enter data Validate data Fname Lname DOB SEX DEATHAGE John Snow 3/15/1813 1 45 George Orwell 6/25/1903 1 46

We will stop here and pick up the second part of the lecture next

We will stop here and pick up the second part of the lecture next week “Stay tuned”

Codebooks • • Contain info that helps users decipher data file content and structure

Codebooks • • Contain info that helps users decipher data file content and structure Includes: • • • Filename(s) File location(s) Variable names Coding schemes Units Anything else you think might be useful

Epi. Data codebook generators

Epi. Data codebook generators

File Structure Codebook Full codebook contains descriptive statistics (demo)

File Structure Codebook Full codebook contains descriptive statistics (demo)

Full Codebook Notice descriptive statistics

Full Codebook Notice descriptive statistics

Conversion of Data File • • Requires common intermediate file format Examples of common

Conversion of Data File • • Requires common intermediate file format Examples of common intermediate files • . TXT = plain text • . DBF = d. Base program • . XLS = Excel • Steps • Export. REC file . TXT file • Import. TXT file into SPSS • Save permanent SAV file

Current Export Formats Supported by Epi. Data

Current Export Formats Supported by Epi. Data

Plain (“raw”) TXT data • • plain ASCII data format no column demarcations no

Plain (“raw”) TXT data • • plain ASCII data format no column demarcations no variable names no labels

TXT file with codebook tox-samp. txt tox-samp. not

TXT file with codebook tox-samp. txt tox-samp. not

SPSS Data Export / Import TXT (raw data) SAV REC SPS (syntax)

SPSS Data Export / Import TXT (raw data) SAV REC SPS (syntax)

Top of tox-samp. sps Lines beginning with * are comments (ignored by command interpreter)

Top of tox-samp. sps Lines beginning with * are comments (ignored by command interpreter) Next set of commands show file location and structure via SPSS command syntax

Bottom part of tox-samp. sps file Labels being imported into SPSS Delete * if

Bottom part of tox-samp. sps file Labels being imported into SPSS Delete * if you want this command to run

Opening the SPS (command) file

Opening the SPS (command) file

Running the SPS file

Running the SPS file

Ethics of Data Keeping • • • Confidentiality (sanitized files – free of identifiers)

Ethics of Data Keeping • • • Confidentiality (sanitized files – free of identifiers) Beneficence Equipoise Informed consent (To what extent? ) Oversight (IRB)