Data capturing strategies used in Istat to improve

  • Slides: 44
Download presentation
Data capturing strategies used in Istat to improve quality Conference of European Statisticians Work

Data capturing strategies used in Istat to improve quality Conference of European Statisticians Work session on statistical data editing (Bonn, 25 -27 September 2006) Editing nearer the source session Rossana Balestrino, Stefania Macchia, Manuela Murgia ISTAT – Italian National Statistics Bureau Rome, Italy balestri@istat. it, macchia@istat. it, murgia@istat. it 1

CASIC techniques have been introduced at Istat in the 1980 s CATI and CAPI

CASIC techniques have been introduced at Istat in the 1980 s CATI and CAPI were adopted first nearly one decade later, CASI was taken into consideration CATI/CAPI offer already mature and well tested solutions so have a higher rate of consolidation CASI techniques are younger and more depending on the continuously evolving of IT solutions and network tools 2

In Istat, for all the techniques : • the internal demand shows an increasing

In Istat, for all the techniques : • the internal demand shows an increasing trend • the experience has taught that it is important that Istat plays a very active role and keeps at least the design and the monitoring phases of the process inside the Institute, in order to get standard solutions driven by quality requirements and enriched with suggestions coming from previous results 3

Strategies for CATI and CAPI surveys Strategies for CASI 4

Strategies for CATI and CAPI surveys Strategies for CASI 4

CATI and CAPI advantages • reduction of costs and time necessary to have data

CATI and CAPI advantages • reduction of costs and time necessary to have data ready to be processed (Groves et al. 2001) • help in preventing from non sampling errors, through the management of vast consistency plans during the interviewing phase (CAPI is not so widely used as CATI in Istat, because is more expensive) 5

Organisation for CATI surveys the content of the survey, made clear in the questionnaire,

Organisation for CATI surveys the content of the survey, made clear in the questionnaire, is designed in Istat, while private companies are charged with the entire data collection procedure. 6

Frequent problems encountered with this organisation Private companies · had never faced in advance

Frequent problems encountered with this organisation Private companies · had never faced in advance the development of electronic questionnaires so complicated in terms of skipping and consistency rules between variables · had never put in practice strategies to prevent and reduce non response errors · had not at their disposal a robust set of indicators to monitor the interviewing phase. 7

New organisation for CATI surveys: in-house strategy It consists in relying on a private

New organisation for CATI surveys: in-house strategy It consists in relying on a private company for the call centre, the selection of interviewers and to carry out the interviews, but in giving it all the software procedure, developed in Istat, to manage the data capturing phase: • calls scheduler • electronic questionnaire • set of indicators to monitor the interviewing phase 8

In-house strategy: the software procedure It integrates different software packages, but the core is

In-house strategy: the software procedure It integrates different software packages, but the core is developed with the Blaise system (produced by Statistics Netherlands and already used by a lot of National Statistics Administrations for data capturing carried out with different techniques) 9

Quality oriented procedure planning Quality standards have been defined for: • the data capturing

Quality oriented procedure planning Quality standards have been defined for: • the data capturing phase • the monitoring phase • the secure transmission of data 10

Standards for the data capturing phase • the layout of the electronic questionnaire to

Standards for the data capturing phase • the layout of the electronic questionnaire to reduce the ‘segmentation effect’ • the customisation of questions’ wording to make the interview more friendly and questions easy to be answered • the management of errors to prevent from all the possible type of errors without increasing the respondent burden and making the interviewers’ job easier 11

Standards for the data capturing phase • the control of data with information from

Standards for the data capturing phase • the control of data with information from previous surveys or administrative archives to improve the quality of the collected data • the assisted coding of textual answers to improve the coding results and to speed up the coding process • the scheduling of contacts to enhance the interviewers’ productivity and to avoid distortion on the probability of respondents to be contacted. 12

Standards for the monitoring phase • A limited but exhaustive set of indicators to

Standards for the monitoring phase • A limited but exhaustive set of indicators to monitor the trend of contact results • Ad hoc instruments to monitor particular aspects of the survey 13

Set of indicators to monitor the trend of contact results n-ways contingency tables useful

Set of indicators to monitor the trend of contact results n-ways contingency tables useful to keep under control the interviewers’ productivity and the presence of odd behaviours in assigning contact results Visual Basic, based on an Access database, which produces Excel files Ad hoc instruments to monitor particular aspects of the survey for example, control charts to monitor the assisted coding of textual variables (if used), like the Occupation SAS QC procedure which produces ‘control charts’ for particular variables 14

Standards for the secure transmission of data The aim is to assure both the

Standards for the secure transmission of data The aim is to assure both the secure transfer of survey data from the private company to Istat and vice versa, and the timeliness of the delivery The daily transmission is based on a ‘secure’ protocol (HTTPS) and puts data on an Istat server, INDATA, placed outside the firewall and devoted to data collection 15

Surveys which used the in-house strategy Surveys Nr of interviews Sample births survey 2001

Surveys which used the in-house strategy Surveys Nr of interviews Sample births survey 2001 Long Sample births survey 2004 Interviews’ length Response rates Refusal rates 16, 597 12’ 00’’ 92. 6% 5. 4% Short 33, 838 5’ 00’’ 93. 2% 4. 9% Long 15, 642 13’ 48’’ 94. 7% 3. 9% Short 33, 515 5’ 43’’ 96. 8% 2. 2% University-to-work transition survey and perspectives 2004 25, 510 10’ 56” 95. 8% 3. 6% Upper secondary school graduates survey 2004 20, 408 13’ 20” 94. 7% 4. 8% 1, 320 9’ 03’’ 99. 8% 0. 1% 25, 000 26’ 54’’ 72. 4% 16. 0% Water System Surveys (preliminary survey) 2006 Violence against women survey (in progress) 16

Surveys which used the in-house strategy Characteristics of the questionnaires Surveys Nr of variables

Surveys which used the in-house strategy Characteristics of the questionnaires Surveys Nr of variables of the electronic questionnaire Sample births survey 2001 Lon g 677 195 Sample births survey 2004 Lon g 707 205 University-to-work transition survey and perspectives 2004 218 324 Upper secondary school graduates survey 2004 315 122 30, 000 52 Water System Surveys (preliminary survey) Nr of checking rules 17

Checking rules in the data capturing phase with the in-house strategy The number checking

Checking rules in the data capturing phase with the in-house strategy The number checking rules included in the data capturing phase (together with the number of variables) are surely significant indicators of the complexity of the survey questionnaire This complexity has not negatively affected the response and refusal rates because 18

Ø the trade-off between the quality of data and the fluency of the interview

Ø the trade-off between the quality of data and the fluency of the interview has been taken into consideration Ø different treatments of the rules to detect errors have been implemented 19

The trade-off between the quality of data and the fluency of the interview The

The trade-off between the quality of data and the fluency of the interview The consistency plans included in the electronic questionnaires comprised a great part, even if not all, of the rules proper of the edit and imputation plans avoiding, during the interview, a too frequent display on the pc-screen of a dialog window asking for the confirmation of the given answer (including the complete edit plan in the data capturing phase would have guaranteed a high quality of the answer but would have definitely burdened the respondent and the interviewer, thus increasing the interruption rate) 20

Different treatments of the rules to detect errors Ø ‘hard mode’ it is not

Different treatments of the rules to detect errors Ø ‘hard mode’ it is not possible to go on with the interview without solving the error Ø ‘soft mode’ the respondent can confirm his ‘inconsistent response’, without compromising the completion of the interview 21

Performance of the in-house strategy in terms of quality Case study two surveys Ø

Performance of the in-house strategy in terms of quality Case study two surveys Ø Upper secondary school graduates survey Ø University-to-work transition survey and perspectives Carried out in: • 2001 old strategy • 2004 in house strategy 22

2004 and 2001 response and refusal rates Upper secondary school graduates survey University-to-work transition

2004 and 2001 response and refusal rates Upper secondary school graduates survey University-to-work transition survey and perspectives 2004 2001 Response rate 94. 7% 85. 4% 95. 8% 94. 0% Refusal rate 4. 8% 10. 8% 3. 6% 3. 9% 23

Prevention from non sampling errors ØUpper secondary school graduates survey Errors per record 2004

Prevention from non sampling errors ØUpper secondary school graduates survey Errors per record 2004 survey (conducted with the in-house strategy) Abs % 13, 013 63. 8 12, 245 52. 6 From 1 to 2 errors 5, 742 28. 1 91. 9 9, 029 38. 8 91. 4 From 3 to 4 errors 1, 183 5. 8 97. 7 1, 582 6. 8 98. 2 470 2. 3 100 406 1. 8 100 No errors 5 and more errors Total 20, 408 Cumulate % 2001 survey (conducted with the external company strategy) Abs 23, 262 % Cumulate %

Prevention from non sampling errors ØUpper secondary school graduates survey Incidence of errors on

Prevention from non sampling errors ØUpper secondary school graduates survey Incidence of errors on the variables Most positive result Occupation ‘in-house strategy’ - coded during the interview with an assisted coding function ‘external company strategy’ - manually coded after the interview - 2001: 4. 92% of raw data had to be corrected, during the edit and imputation phase - 2004: 0. 81% (with the new strategy) had to be corrected, during the edit and imputation phase 25

Strategies for CATI and CAPI surveys Strategies for CASI 26

Strategies for CATI and CAPI surveys Strategies for CASI 26

CASI Ø prototypal experiences realised in the late 1990 s Ø current situation comprises

CASI Ø prototypal experiences realised in the late 1990 s Ø current situation comprises several Web sites, located at Istat side and dedicated to the capture of surveys data for approximately 30 surveys The need of designing a new environment and new rules aimed at introducing more standard solutions and effective security measures came out. 27

Strategy for CASI surveys To set up a cross data capturing Web site to

Strategy for CASI surveys To set up a cross data capturing Web site to be used as a unique front-end for respondents to any survey INDATA (https: //indata. istat. it) This new policy, already launched, is still in progress 28

INDATA web site: aims • To present the Institute outside with a homogeneous and

INDATA web site: aims • To present the Institute outside with a homogeneous and stable public image and identity; • To guarantee the mutual identity of data sender and receiver; • To guarantee data confidentiality in the data collection phase and comprehensive security of the production environment; • To minimize the impact on the technical environment of the respondent (it is not necessary to install SW on the client workstation). 29

INDATA web site: aims • To reply to the user about the action carried

INDATA web site: aims • To reply to the user about the action carried out by him (confirmation e-mail); • To facilitate monitoring of collection activities; • To favour the internal management and contain cost of the operational environment dedicated to data capturing. 30

31

31

Main functions offered to users • To be informed about the survey; • To

Main functions offered to users • To be informed about the survey; • To get and print forms and instructions; • To fill in electronic forms online; • To download electronic forms; • To upload forms completed offline; • To transfer any dataset in a safe way. 32

In synthesis Both primary (single questionnaire, CSAQ = Computer Self Administrated Questionnaire ) and

In synthesis Both primary (single questionnaire, CSAQ = Computer Self Administrated Questionnaire ) and secondary data collection (collection of data) are dealt with. Primary data collection is dealt in online and offline mode. 33

The INDATA web platform The platform was initiated in the late ‘ 90 s

The INDATA web platform The platform was initiated in the late ‘ 90 s with prototype applications. Present Technological Features: – Operation system LINUX Red Hat 2. 6. 9; – Web server APACHE 2. 0. 52; – DBMS MYSQL and ORACLE 10; – Application language PHP 5. 1. 2; – Authenticity Certificate by Postecert; – Secure HTTP. 34

INDATA architecture: requirements and constraints § Three level architecture ( WEB, APPLICATION, DB) §

INDATA architecture: requirements and constraints § Three level architecture ( WEB, APPLICATION, DB) § Secure system, safe back-end intranet § Balanced load § High level of reliability 35

System Architecture Firewall Load Balancer Web server Front End Firewall Web application server DB

System Architecture Firewall Load Balancer Web server Front End Firewall Web application server DB server Back End 36

Web Surveys and Directorates Central Directorate for Structural Surveys on Businesses 13 Central Directorate

Web Surveys and Directorates Central Directorate for Structural Surveys on Businesses 13 Central Directorate for Short Term Surveys on Businesses 6 Central Directorate for Surveys on Institutions 2 TOTAL 21 37

Electronic Questionnaire Type Generation mode N. of treated surveys PHP language - PDF questionnaire

Electronic Questionnaire Type Generation mode N. of treated surveys PHP language - PDF questionnaire via TELEFORM - online compilation 10 PHP language - EXCEL questionnaire offline compilation PHP language - BLAISE questionnaire offline compilation 8 1 38

CSAQ and Editing Rules PDF questionnaire: editing rules are implemented in javascript language and

CSAQ and Editing Rules PDF questionnaire: editing rules are implemented in javascript language and comprise both range and consistency rules; the outcome of the editing activity is presented to the respondent globally, as a sequence of error messages, at the end of the compilation after pressing the submit button; EXCEL questionnaire: no editing macro is implemented in order not to discourage the respondent with alarm messages; all the cells are blocked apart from the input ones; data validation in single cells and default formulas in calculated variables are available; no or minimum consistency checking is performed. 39

E-response rates for Structural Business Statistics Survey Year Observed users Form Pages E-response rate

E-response rates for Structural Business Statistics Survey Year Observed users Form Pages E-response rate 2003 10, 000 10 36% 2004 10, 000 10 60% 2005 10, 000 10 . . . 11. Yearly Survey on Provisional Estimate of Value Added 2004 10, 000 1 32% 2005 10, 000 1 75% 12. Yearly Industrial Production Survey 2004 45, 000 2 23% 2005 68, 000 2 . . . 13. Yearly Survey on the structure of Labour Cost 2004 15, 000 15 30% 14. Yearly Survey on Telecommunications 2004 250 3 100% 2005 250 3 . . . 10. Yearly Survey on Business Accounts 40

Surveys and data capture mode 1 Survey on book production – Works published in

Surveys and data capture mode 1 Survey on book production – Works published in 2005 PHP language - EXCEL questionnaire - offline compilation 2 Quarterly survey on turnover and orders PHP language - PDF questionnaire via TELEFORM - online compilation 3 Quarterly Business Survey on job vacancies PHP language - PDF questionnaire via TELEFORM - online compilation 4 Periodic Survey on Hotel Activity PHP language - PDF questionnaire via TELEFORM - online compilation 5 Monthly Survey on employment, working hours and wages PHP language - PDF questionnaire via TELEFORM - online compilation 6 Monthly Survey on retail sales PHP language - PDF questionnaire via TELEFORM - online compilation 7 Yearly Survey on transports by rail PHP language - PDF questionnaire via TELEFORM - online compilation 8 Yearly Survey on Information Technology in financial businesses PHP language - PDF questionnaire via TELEFORM - online compilation 9 Yearly Survey on Information Technology in non-financial businesses PHP language - PDF questionnaire via 41 TELEFORM - online compilation

Surveys and data capture mode 10 Yearly Survey on business accounts PHP language -

Surveys and data capture mode 10 Yearly Survey on business accounts PHP language - EXCEL questionnaire - offline compilation 11 Yearly Survey on Provisional Estimation of the Value Added PHP language - EXCEL questionnaire - offline compilation 12 Yearly Industrial Production Survey (PRODCOM) PHP language - EXCEL questionnaire - offline compilation 13 Yearly Survey on the Structure of Labour PHP language - EXCEL questionnaire - offline Cost compilation 14 Yearly Survey on Telecommunication Enterprises PHP language - EXCEL questionnaire - offline compilation 15 Yearly Survey on structure and production of farms PHP language – BLAISE executable questionnaire - offline compilation 16 Quick Survey on certificates of balance accounts of Municipalities Documentation and instructions for sending a file 17 Quick Survey on certificates of balance accounts of Provincial Administrations Documentation and instructions for sending a file 18 Three-year survey on graduates (survey addressed to Universities) PHP language - EXCEL questionnaire -42 offline compilation

Surveys and data capture mode 19 Six-month estimative survey on the consistency of livestock

Surveys and data capture mode 19 Six-month estimative survey on the consistency of livestock PHP language - PDF questionnaire via TELEFORM - online compilation 20 Yearly Survey on fishery in lakes and artificial docks PHP language - PDF questionnaire via TELEFORM - online compilation 21 Yearly Survey on economical results of farms PHP language - EXCEL questionnaire - offline compilation 43

Thanks 44

Thanks 44