Validation The German case DATA Validation The German

  • Slides: 21
Download presentation
Validation – The German case DATA Validation - The German Case Data Validation Infrastructure

Validation – The German case DATA Validation - The German Case Data Validation Infrastructure and Tools in a Federal E nvironment Michael Schäfer C 302 Data Collection and Processing Federal Statistical Office, Wiesbaden

Validation – The German case Overview n The federal perspective n Infrastructure n Tools

Validation – The German case Overview n The federal perspective n Infrastructure n Tools and workflow n n Specification: Data Edit Designer Processing: Data Edit Runtime

Validation – The German case Overview n The federal perspective n Infrastructure n Tools

Validation – The German case Overview n The federal perspective n Infrastructure n Tools and workflow n n Specification: Data Edit Designer Processing: Data Edit Runtime

Validation – The German case The federal perspective ■ German statistical offices 1 federal

Validation – The German case The federal perspective ■ German statistical offices 1 federal office, ~ 140 surveys ■ 14 offices of the member states federal Länder, ~ 250 surveys ■ Huge overlap of commonly produced statistics ■ ■ A history of co-operation and standardisation Continuously evolving standardisation process, from mostly programmatic aspects to a comprehensive approach including processes, infrastructure and shared and distributed applications. ■ Embedded into a wider modernisation process to manage the transition to fully electronic workflows -> e. STATISTICS ■

Validation – The German case The federal perspective ■ The ”One for All” principle

Validation – The German case The federal perspective ■ The ”One for All” principle > 40 years in SW development ■ > 10 years in statistical production ■ ■ Main requirements to make it work Coordinated design, implementation and production processes ■ Sharing / replication of statistical and programmatic objects. ■ Standardised interfaces and/or shared/distributed applications. ■ Hosting of applications and central services, data exchange. ■ Many, many standards and conventions. ■

Validation – The German case The federal perspective Elements of e. STATISTICS. BASE Tools

Validation – The German case The federal perspective Elements of e. STATISTICS. BASE Tools (Design) Applications Metadata infrastructure • Data Models • . BASE Replication DB • IDEV • Data Edits • Survey DB • CORE • Forms • … • Collection DB • Mappings • Data Edit Runtime • … Dat. ML/RAW, Dat. ML/RES, Dat. ML/ASK, Dat. ML/EDT, Dat. ML/SDF, … XML document types

Validation – The German case Overview n The federal perspective n Infrastructure n Tools

Validation – The German case Overview n The federal perspective n Infrastructure n Tools and workflow n n Specification: Data Edit Designer Processing: Data Edit Runtime

Validation – The German case Metadata flow V Statistical offices. BASE Tool Suite .

Validation – The German case Metadata flow V Statistical offices. BASE Tool Suite . BASE Server Validation . BASE Replication DB Data Edit V Runtime CORE / V IDEV Survey DB (private) Domain Applications Business V Application Survey DB (public) Businesses Public Production V Private Design Infrastructure Raw data flow

Validation – The German case Overview n The federal perspective n Infrastructure n Tools

Validation – The German case Overview n The federal perspective n Infrastructure n Tools and workflow n n Specification: Data Edit Designer Processing: Data Edit Runtime

Validation – The German case Tools and workflow ■ Data Edit Designer ("PL-Editor") ■

Validation – The German case Tools and workflow ■ Data Edit Designer ("PL-Editor") ■ ■ Forms Designer ("Formular-Editor") ■ ■ For developing and testing interactive web forms Survey Designer ("SDF-Editor") ■ ■ For developing and testing data validation and editing procedures For developing and testing data collection data models Data Edit Runtime ("PL-Ablaufumgebung") ■ For executing data validation and editing procedures

Validation – The German case Metadata flow Tools and workflow Raw data flow V

Validation – The German case Metadata flow Tools and workflow Raw data flow V Dat. ML/ASK Dat. ML/EDT. BASE Replication DB Data Edit Designer IDEV Dat. ML/EDT. BASE Server Validation Forms Designer Dat. ML/EDT Survey DB (private) V Data Edit V Runtime Dat. ML/ASK Dat. ML/SDF Survey Designer CORE Dat. ML/SDF V

Validation – The German case Overview n The federal perspective n Infrastructure n Tools

Validation – The German case Overview n The federal perspective n Infrastructure n Tools and workflow n n Specification: Data Edit Designer Processing: Data Edit Runtime

Validation – The German case Specification: Data Edit Designer ■ Main features Create, edit

Validation – The German case Specification: Data Edit Designer ■ Main features Create, edit and share data validation and editing specifications ■ Output as Java code or in XML format (Data. ML/EDT) ■ Simple functions for analysing and documenting specifications ■ Sharing specifications through the. BASE infrastructure ■ Providing metadata for further design processes and production ■ ■ History 2004: Begin of productive use at the FSO ■ 2005: Begin of productive use at other statistical offices ■ 2006: Integration with. BASE replication DB ■

Validation – The German case Specification: Data Edit Designer ■ Variables Represent single information

Validation – The German case Specification: Data Edit Designer ■ Variables Represent single information items of a survey ■ Provide the basis for structural validation ■ Can have error messages and correction hints attached ■ ■ Topics ("Themes") For grouping sets of associated variables ■ Can be nested for building hierarchies ■

Validation – The German case Specification: Data Edit Designer ■ Validation rules For logical

Validation – The German case Specification: Data Edit Designer ■ Validation rules For logical validation of one or more variables ■ Can make use of reference data ■ Return FALSE if NO ERROR has been detected, otherwise TRUE ■ Can have automated edits attached ■ Can have error messages and correction hints attached ■ ■ Procedures ("Controls") Control the flow / sequence of rules to be invoked ■ Useful for scenario-based validation (interviewer, web form, domain application…) ■

Validation – The German case Specification: Data Edit Designer ■ Workflow W O R

Validation – The German case Specification: Data Edit Designer ■ Workflow W O R K F L O W Define information items > Variables Group information items > Topics Specify validation rules and edits > Rules Specify validation procedures > Procedures Test and deploy (Dat. ML/EDT) > Survey DB

Validation – The German case Specification: Data Edit Designer Data Validation and Editing Specification

Validation – The German case Specification: Data Edit Designer Data Validation and Editing Specification Language Control Assert and set Procedures • Full instruction set • Can iterate over reference data • Scoping/Scenarios Reuse Functions Validation rules • Small instruction set • Return TRUE or FALSE • Reduced instruction set • Can iterate over reference data • Have one return value Errors trigger Automated edits • Small instruction set Properties

Validation – The German case Overview n The federal perspective n Infrastructure n Tools

Validation – The German case Overview n The federal perspective n Infrastructure n Tools and workflow n n Specification: Data Edit Designer Processing: Data Edit Runtime

Validation – The German case Specification: Data Edit Runtime ■ Main features Fully generic;

Validation – The German case Specification: Data Edit Runtime ■ Main features Fully generic; manages any survey an Dat. ML/EDT document is provided for via the Survey DB ■ Supports data capture based on Dat. ML/ASK documents ■ Data under test is uploaded into a database using survey-specific auto-generated schemas ■ Analytical and statistical functions for error statistics ■ Tabular view for comparing data from different reference periods. ■ One data set under test per reference period, other reference periods can be accessed as reference data (read-only) ■ Data in context either is the current record or the current hierarchical record set. ■

Validation – The German case Specification: Data Edit Runtime ■ Organisation Statistics Survey group

Validation – The German case Specification: Data Edit Runtime ■ Organisation Statistics Survey group Survey Variables Survey DB Topics Reference period Data set Rules Dat. ML/EDT Procedures Reference Data sets Versioned resources assign …

Validation – The German case Thank you for your attention!

Validation – The German case Thank you for your attention!