Statistical Data cleaning at Statistics Denmark Anette M

  • Slides: 9
Download presentation
Statistical Data cleaning at Statistics Denmark Anette M. Hertz (ahz@dst. dk)

Statistical Data cleaning at Statistics Denmark Anette M. Hertz (ahz@dst. dk)

Objectives The objective of this presentation is to give insights into the data cleaning

Objectives The objective of this presentation is to give insights into the data cleaning routines at Statistics Denmark and External economy in particular Data cleaning of the largest Danish companies to ensure consistency in data reported across statistical domains § Standardised database and datavalidation interface § 2

Data cleaning at Statistics Denmark At DST level we speak of two different paths

Data cleaning at Statistics Denmark At DST level we speak of two different paths when it comes to data cleaning § Large cases unit type data validation - Currently there around 30 Danish multinationals selected as companies that are critical Statistics Denmark - The data reported by these companies are investigated thoroughly by the large cases unit to ensure consistency in the data reported across statistical domains § Standardised data validation - A standardised IT – interface and database is used - The data cleaning principles are the same, to ensure mobility of staff between different statistical areas 3

Data cleaning - International trade in goods Statistics Denmark is responsible for cleaning the

Data cleaning - International trade in goods Statistics Denmark is responsible for cleaning the customs data with regards the statistical quality needs § Checks build into the customs declaration - Combinational checks (eg. Is the combination of CPC and country code valid) - Rough unit value validation - Idea: The closer to the time of reporting and error is detected the easier it is to correct § Probable error detection on unit values, and macro check at company level - External economy is responsible for contaction the Danish companies on the possible mistakes - External economy cannot give information gained from the companies back to the Danish customs 4

Data cleaning - International trade in goods Probable error detection § § § Validation

Data cleaning - International trade in goods Probable error detection § § § Validation on unit values (value/weight, value/suppl. Unit) Potential errors identified weekly Emails are automatically sent to the relevant companies with a one week deadline Companies with potential errors with a value > 10. 000 Dkk (4. 370. 000 Lari) are contacted by phone before dataproduction starts Each probable error has a score, such that we only spend time on the potential errors that matter the most on the disseminated figures Some of the identified probable errors are corrected automatically 5

Probable error detection 6

Probable error detection 6

Data cleaning - International trade in goods 7

Data cleaning - International trade in goods 7

Data cleaning - International trade in services Program to identify large developments in the

Data cleaning - International trade in services Program to identify large developments in the reported data § Based on the formula used for the probable error detection (although only value (no weight or quantity) in the ITS data) Manual inspection, if deemed necessary the company is contacted Each potential error has a score, such that we only spend time on the potential errors that matter the most on the disseminated figures 8

Data cleaning - Balance of Payments We have some data validation across the two

Data cleaning - Balance of Payments We have some data validation across the two statistics to ensure a better quality of the balance of payments Processing: Goods to and from processing (based on the customs procedure codes) are compared to the ITSstatistics to see if they have reported the corresponding import/export of processing activities § Goods sent from Denmark as part of a construction project § 9