CORE Bringing the GSBPM to life J Linnerud
CORE Bringing the GSBPM to life! J. Linnerud & J. -P. Kent 1
Main points 1. An ideal development process for a statistical system 2. Why this ideal usually is not met 3. How CORE aims at supporting this ideal development process 2
Statistics: How? • Specify a statistic • Design a process that will produce this statistic • Build a system that will execute this process 3
What is the product? • Define a statistic – What does it say? • Measures, dimensions, explanations… – What does it look like? • Tables, Press release, Analytic paper… – What is the input? • Population, variables, data sources… – What is the relation between input and output? • Methods to apply 4
Statistics: How? • Specify a statistic • Design a process that will produce this statistic • Build a system that will execute this process 5
How to produce the statistic • Model the data – Input, output, intermediary results • Specify process steps to apply the chosen statistical methods • Integrate these steps in a process flow 6
Statistics: How? • Specify a statistic • Design a process that will produce this statistic • Build a system that will execute this process 7
Let the machine do it • Implement the data models • Implement the process steps • Implement the process flow 8
Why is this approach good? (1) • Variability vs. stability – Statistical products are specific • There is a great variety of products • A given product will vary in time – Statistical processes are generic • The same method can be applied to many products • Process steps implementing methods can be reused • A significant change in the product can be implemented with some simple changes in some process steps 9
Why is this approach good? (2) • It allows a clean specification of the product – In terms of what it is – In terms of what is used – In terms of what the relation is between input and output 10
Why is this approach good? (3) • It separates product design from IT – The product is defined in terms of what it is (and not how it is produced) – The process is defined in terms of what it does (and not how it is implemented) – Only the system is defined in technical terms 11
Why is this approach good? (4) • It supports optimalisation of process development – Possibility of developing standardised, reusable process steps – Generic process steps are not defined for an actual statistic, but for use in different statistics 12
Main points 1. An ideal development process for a statistical system 2. Why this ideal usually is not met 3. How CORE aims at supporting this ideal development process 13
The usual approach • Statisticians present a project in which product and process are combined • IT people specify and build a system that creates the product by performing the process 14
Why is the usual approach inefficient? • Complexity • Process & product are tightly coupled • Rigidity • Maintenance is labour-intensive • Specificity • It is not easy to devise a generic solution when developing for a specific product 15
Main points 1. An ideal development process for a statistical system 2. Why this ideal usually is not met 3. How CORE aims at supporting this ideal development process 16
Promoting the better approach 1. The CORA and CORE projects (Jenny) 2. Bringing the results into practice (Jean-Pierre) 17
CORA ESSnet • COmmon Reference Architecture (CORA) Financed by Eurostat under 2009 Statistical Workprogramme Countries involved: it (coordinator), ch, dk, lv, nl, no, se Duration: October 2009 - October 2010 18
CORA deliverables Questionnaire Set of Requirements State of the Art Definition of the Layered Model Technical Annex Instruction Manual Commercial and Legal Foundations for the Exchange of Software between Statistical Offices • Requirements Checklist for CORA Tools • Recommendations for CORA Tools • • 19
After CORA … CORE! COmmon Reference Environment (CORE) Financed by Eurostat under 2010 Statistical Workprogramme Countries involved: it (coordinator), fr, nl, no, pt, se Duration: December 2010 - January 2012 20
CORE Workpackages • Design of the information model according to GSBPM and alignment with NSI's information models • Generic interface design for interconnecting GSBPM sub-processes • Research workflow solutions for process management • Implementation library for generic interface and production chain for. NET • Implementation library for generic interface and production chain for Java 21
Practical usage of CORA / CORE • Modeling a process in terms of services (CORA) • Classifying services (CORA) • Making services platform-independent (CORE) 22
1 Specify Needs Figures Time series Statistic Population Unit Variable Value 2 Design 3 Build 4 Collect 5 Process 6 Analyse 7 Disseminate 8 Archive 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process 6 Analyse 7 Disseminate An example process 8 Archive Figures • A transport statistic Time series Statistic Population Unit – Input: • Loading reports • Unloading reports – Date, time, place, type & quantity goods, type vehicle – Output: • Monthly transport data Variable Value – Same data also used for time series 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process 6 Analyse Modeling approach Figures • Use the CORA space grid Time series Statistic Population Unit Variable Value 7 Disseminate 8 Archive 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process 6 Analyse 7 Disseminate Figures Time series Macrodata Statistic Population Unit Variable Value Microdata 8 Archive 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process 6 Analyse 7 Disseminate Modeling approach 8 Archive Figures • Use the CORA space grid series • Display statistical services in the Statistic appropriate cells Time Population Unit Variable Value 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process Figures Time series Statistic Population Unit Variable Value Aggregate Macroediting 6 Analyse 7 Disseminate 8 Archive 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process 6 Analyse 7 Disseminate Modeling approach 8 Archive Figures • Use the CORA space grid series • Display statistical services in the Statistic appropriate cells • Join services with arrows to show Population the dependencies Time Unit Variable Value 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 5 Process Figures Time series Statistic Population Unit Variable Value Aggregate Macroediting 6 Analyse 7 Disseminate 8 Archive 9 Evaluate
1 Specify Needs 2 Design 3 Build 4 Collect 6 Analyse 5 Process 7 Disseminate Confidentialty control 9 Evaluate Monthly Transport Publication Figures Time 8 Archive Publication data ? Select period data series Archive Time Series data Integrate data Supply period data Statistic Macroediting Population Unit Variable Value Archive Statistic data Aggregate Outlier detection Microediting ? Error detection Compute distance Correct outliers Correct variables Combine Download Archive Unit data Archive obs. vars.
A traditional service Script (X) Model (X) Input (X) Model (X) Tool X Output (X) 32
A CORA service Script (CORA) Input (CORA) Model (CORA) CV Output (CORA) Model (CORA) CV CV CV Logging CV Script (X) (Y) Model (X) (Y) Tool X Y (X) Input (Y) CV = Convertor Output (X) (Y) 33
- Slides: 33