The Edit Anders Norberg Statistics Sweden SCB Work
The Edit Anders Norberg, Statistics Sweden (SCB) Work Session on Statistical Data Editing Ljubljana, Slovenia, 9 -11 May 2011
The environment of SELEKT Input, throughput, output, use
The environment of SELEKT Input, throughput, output, use Suspicion
SELEKT 1. 1 Raw+edited past (cold) survey data Survey specific cold adapter (SAS code) Data preparation SAS data set Input (hot) survey data Edits SNOWDONX analysis Table of Parameters of edits CLAN estimation software Table of Estimates Records to FOLLOW-UP PRE-SELEKT Parameter specifications, Analysis of cold data SAS data set AUTOSELEKT Score calculation & record flagging Records to IMPUTATION Survey specific hot adapter (SAS code) Data preparation Accepted records Process data and reports
Glossary of Terms on Statistical Data Editing (1) “EDIT RULE SPECIFICATION CHECK RULE SPECIFICATION A set of check rules that should be applied in the given editing task. ”
Glossary of Terms on Statistical Data Editing (2) “CHECKING RULE A logical condition or a restriction to the value of a data item or a data group which must be met if the data is to be considered correct. In various connections other terms are used, e. g. edit rule. ”
Recommended Practices for Editing and Imputation in Crosssectional Business Surveys “EDIT A logical condition or a restriction to the value of a data item or a data group which must be met if the data is to be considered correct. Also known as edit rule or checking rule. ”
Example 1 if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A 01 = ‘Flag’
Example 1 The test variable if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A 01 = ‘Flag’
Example 1 The edit group if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A 01 = ‘Flag’
Example 1 The acceptance region if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A 01 = ‘Flag’
Example 2 The test variable if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) or Occupation = ‘Nurse’ and not (23300 < Salary < 43800) then Errcode_A 02 = ‘Flag’
Example 2 The edit groups if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) or Occupation = ‘Nurse’ and not (23300 < Salary < 43800) then Errcode_A 02 = ‘Flag’
Example 2 The acceptance regions if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) or Occupation = ‘Nurse’ and not (23300 < Salary < 43800) then Errcode_A 02 = ‘Flag’
Edits EDIT Edit identification Type of edit Active Section Internal error message External error message Instruction for data review Un-edited test variable Error flag EDIT GROUP AND ACCEPTANCE REGION Edit identification Edit group Acceptance region
Edits EDIT GROUP AND EDIT ACCEPTANCE REGION Edit identification Type of edit Active Section Internal error message External error message Instruction for data review 1 Edit identification Edit group Acceptance region EDIT PRACTICAL SUPPORT 2 Un-edited test variable Error flag 3 Edit identification Standard edit rule Edited test variable Suspicion probability value produced by the SELEKT system IMPACT ON STATISTICS LINK Edit identification Survey variable 4 5 FLAGGING EDITS, VARIABLES AND UNITS Survey variable Potent. impact on statistics
My questions (1) • Can most edits be described as consisting of the components – test variable – edit group – acceptance region ? • What types of edits can not?
My questions (2) If the edits can be described this way, what arguments are there for saying that – one edit has only one edit group and one acceptance region – one edit can be composed of many edit groups with one acceptance region each?
My questions (3) Can you give me examples of • similar modeling of edits • metadata storage for edits • edit script generator using a standard metadata storage for edits
- Slides: 19