TURKISH STATISTICAL INSTITUTE HARZEMLI The DDI Based Statistical
TURKISH STATISTICAL INSTITUTE HARZEMLI, The DDI Based Statistical Production Platform İlker GÜVEN Head of Data Management Group ilker. guven@tuik. gov. tr INFORMATION TECHNOLOGIES DEPARTMENT 1
TURKISH STATISTICAL INSTITUTE The QUESTION v Is there a way to generate a standardized approach to statistics production with an increased level of quality and decreased workload? Why this QUESTION should be handled? v The role of Statistical Institutes includes making surveys and disseminating data for statistical subjects. v Statistical Institutes need to integrate the data of different subject-matter units with each other. v Usage of common/standardized classification brings benefits for implementing GSBPM & GSIM. INFORMATION TECHNOLOGIES DEPARTMENT 2
TURKISH STATISTICAL INSTITUTE Why this QUESTION should be handled? (cont’d…) v Standardized data entry screens and standardized data send/receive methods in data collection. v Standardized data model for dissemination. INFORMATION TECHNOLOGIES DEPARTMENT 3
TURKISH STATISTICAL INSTITUTE The SOLUTION v. DDI + Rule Files + Classification Server => for survey part v. Uniform data model + Classification Server => for dissemination part INFORMATION TECHNOLOGIES DEPARTMENT 4
TURKISH STATISTICAL INSTITUTE The RESULT v. HARZEMLI, The DDI-based statistical production platform v. Metadata-driven, dynamically created web-based surveys that have similar look-and-feels v. Shortened and standardized IT processes with respect to metadata v. Metadata is actively used in phases of GSBPM v. Standard names of all the variables in terms of data integrity v. Shortened duration for data collection, more time to analyze v. Easy integration for compilation of private sector data in related surveys v. Similar database tables whose structures are generated automatically INFORMATION TECHNOLOGIES DEPARTMENT 5
TURKISH STATISTICAL INSTITUTE The RESULT (cont’d…) v MEDAS (Central Dissemination System – A subsystem of Harzemli Platform) v. Reduced reporting and interface burden in dissemination v. Much easier comparability of data v. Correlation detection between any subjects for any end user INFORMATION TECHNOLOGIES DEPARTMENT 6
TURKISH STATISTICAL INSTITUTE HARZEMLI (al-Khwarizmi) ( c. 780 – c. 850), formerly Latinized as Algoritmi or Algaurizin, was a Persian[1][5] mathematician, astronomer and geographer "Algebra" is derived from al-jabr, one of the two operations he used to solve quadratic equations. Algorism and algorithm stem from Algoritmi, the Latin form of his name. [7] His name is also the origin of (Spanish) guarismo[8] and of (Portuguese) algarismo, both meaning digit. INFORMATION TECHNOLOGIES DEPARTMENT 7
TURKISH STATISTICAL INSTITUTE BEFORE HARZEMLI INFORMATION TECHNOLOGIES DEPARTMENT 8
TURKISH STATISTICAL INSTITUTE BEFORE HARZEMLI • Developer 1 + Subject matter unit 1 staff • Developer 2 + Subject matter unit 2 staff INFORMATION TECHNOLOGIES DEPARTMENT • Developer 3 + Subject matter unit 3 staff • Developer n + Subject matter unit n staff 9
TURKISH STATISTICAL INSTITUTE BEFORE HARZEMLI v Different Java applications for each survey v Different data designs for each survey v No convention in metadata between subjects => lack of data integrity v Long period of time for the completion of survey due to papers INFORMATION TECHNOLOGIES DEPARTMENT 10
TURKISH STATISTICAL INSTITUTE WITH HARZEMLI v No need to write new Java codes v Similar data design for each subject v Totally metadata-driven, automatically generated web applications v Shorter period of time for the completion of survey, since there is no paper INFORMATION TECHNOLOGIES DEPARTMENT 11
TURKISH STATISTICAL INSTITUTE HARZEMLI Consists of: v Metadata Editor (Nesstar, now developing our own editor) v Rule Editor (generates Rule XML files) v Desktop, Mobile, and Web editions v Management Console with modules (Analysis, Visualization, etc. ) v MEDAS INFORMATION TECHNOLOGIES DEPARTMENT 12
TURKISH STATISTICAL INSTITUTE HARZEMLI (SURVEY PART) Metadata. Editor Rule Editor Classification Server DDI Document Rule XML File Code Lists Variables Harzemli Application Web based survey interface INFORMATION TECHNOLOGIES DEPARTMENT 13
TURKISH STATISTICAL INSTITUTE HARZEMLI MILESTONES Milestones of the project Development dates 1 Harzemli Desktop 2012 2 Harzemli Rule Editor 2012 3 Harzemli Management Console 2012 4 IDM 2013 5 Harzemli Web 2013 6 MEDAS 2014 7 Harzemli Mobile 2014 8 Harzemli Analysis 2014 INFORMATION TECHNOLOGIES DEPARTMENT 14
TURKISH STATISTICAL INSTITUTE Survey Migration to HARZEMLI Harzemli Desktop Web Mobile 2013 6 26 * 32 2014 4 46 * 50 2015 2 13 7 22 TOTAL 12 85 7 104 INFORMATION TECHNOLOGIES DEPARTMENT TOTAL 15
TURKISH STATISTICAL INSTITUTE PRODUCTIVITY GAINS (Time) v Decreased time for software engineers to develop data entry applications 4 WEEKS 1 WEEK v Decreased time for data collection 8% decrease in data collection time 50% increase in time available for regional offices to analyze their data, thanks to instant access to data 12% decrease in the analysis time that is necessary for the staff of central office. Time period necessary to prepare the press releases has been shortened. For example, the time period that is necessary for the preparation of the monthly Labor Statistics press releases has been shortened by 4 days. INFORMATION TECHNOLOGIES DEPARTMENT 16
TURKISH STATISTICAL INSTITUTE PRODUCTIVITY GAINS (Quality) v Software Process Standardization v Data Integrity v Common code development v Common Components Available INFORMATION TECHNOLOGIES DEPARTMENT 17
TURKISH STATISTICAL INSTITUTE PRODUCTIVITY GAINS (Costs) v 870 trees would be cut to produce 10 Million A 4 papers only in 2015 if the paper editions of surveys had continued. v 58% reduction in the cost of pollsters’ travelling expenses INFORMATION TECHNOLOGIES DEPARTMENT 18
TURKISH STATISTICAL INSTITUTE MEDAS (Central Dissemination Project) v Dissemination System of Harzemli Platform (Just like the survey part) Before MEDAS, dissemination databases: v Different database designs v Different applications for each statistical subject v Lack of standard codes v No data integrity, hard to detect correlations INFORMATION TECHNOLOGIES DEPARTMENT 19
TURKISH STATISTICAL INSTITUTE BEFORE MEDAS • Database specialist 1 + Subject matter unit 1 staff • Database Specialist 2 + Subject matter unit 2 staff INFORMATION TECHNOLOGIES DEPARTMENT • Database Specialist 3 + Subject matter unit 3 staff • Database Specialist n + Subject matter unit n staff 20
TURKISH STATISTICAL INSTITUTE MEDAS Before MEDAS: v IT involved too much in data dissemination because of manual processes v Our old dissemination technology brings security threats INFORMATION TECHNOLOGIES DEPARTMENT 21
TURKISH STATISTICAL INSTITUTE MEDAS v Dissemination of different statistical subjects through generic data model - single data source and single application v enables to compare any number of subjects in the same report => usage of classification server has a major role v Reduced reporting burden thanks to single application v Modern pivot tables INFORMATION TECHNOLOGIES DEPARTMENT Central Dissemination Database Comparability 22
TURKISH STATISTICAL INSTITUTE MEDAS v IT just builds the pipeline and the data is filled with the related units v Reduction in person-dependency => good for managers v Easy to develop web services thanks to generic data model v Easier database administration since the number of schemas(workspaces) reduces dramatically INFORMATION TECHNOLOGIES DEPARTMENT 23
TURKISH STATISTICAL INSTITUTE MEDAS v MEDAS Report Screen – See the results for different subjects on the same report INFORMATION TECHNOLOGIES DEPARTMENT 24
TURKISH STATISTICAL INSTITUTE MEDAS v MEDAS Report Screen – See the results for different subjects on the same report INFORMATION TECHNOLOGIES DEPARTMENT 25
TURKISH STATISTICAL INSTITUTE MEDAS v In use since April 2014 v 26 / 62 statistical subjects have been migrated to MEDAS v 46 at the end of 2015 & 62 at the end of 2016 INFORMATION TECHNOLOGIES DEPARTMENT 26
TURKISH STATISTICAL INSTITUTE FINAL WORDS v. So far, Harzemli Platform has had great success and has seen great support by both the presidency and staff from the units. v. Harzemli Platform is going to enlarge by the new modules that are currently being developed to be added in the near future. INFORMATION TECHNOLOGIES DEPARTMENT 27
TURKISH STATISTICAL INSTITUTE Thank you INFORMATION TECHNOLOGIES DEPARTMENT 28
TURKISH STATISTICAL INSTITUTE APPENDIX INFORMATION TECHNOLOGIES DEPARTMENT 29
TURKISH STATISTICAL INSTITUTE HARZEMLI WEB Mainly used for; v Business surveys v Surveys applied to public bodies and universities v No paper-based businesses survey since 2014 INFORMATION TECHNOLOGIES DEPARTMENT 30
TURKISH STATISTICAL INSTITUTE HARZEMLI DESKTOP v Applying survey (e. g, household surveys) without an internet connection v Designed for netbooks / mini laptops v Extra control mechanisms for data integrity (we call them edit codes) v Sending / receiving data to central databases via web services when conne INFORMATION TECHNOLOGIES DEPARTMENT 31
TURKISH STATISTICAL INSTITUTE HARZEMLI MOBILE Ø Android application designed for tablets Ø Similar mechanisms with Harzemli Desktop Ø Take advantage of mobile operating systems and leightweight devices on the field INFORMATION TECHNOLOGIES DEPARTMENT 32
TURKISH STATISTICAL INSTITUTE HARZEMLI MANAGEMENT CONSOLE Harzemli masaüstü ve mobilde gerçekleştirilen çalışmaların kullanıcı ve form yetkilendirilmesinin yapıldığı menü Harzemli sistemindeki çalışmaların bilgi işlem tarafından yönetimini sağlar(veritabanı tablolarının oluşturulması gibi) Harzemli masaüstü ve mobilde yapılan çalışmalar için kimlik doğrulama işleminin gerçekleştirildiği menü Sms modülü hanelerin ve firmaların araştırmalar konusunda bilgilendirildiği menü Bölge müdürlükleri tarafından yürütülen araştırmaların alan ve masabaşı iş yükünün raporlandığı ekran Harzemlide yapılan bütün çalışmaların mevcut durum ve analiz raporlarının oluşturulduğu, eklendiği ve gösterildiği menü Harzemli sistemine yeni bir çalışma eklemeye yarayan menü INFORMATION TECHNOLOGIES DEPARTMENT 33
TURKISH STATISTICAL INSTITUTE HARZEMLI ANALYSIS v Users create and run their own error-finder rules, run special rules, or trigger and run analyses on streams/files from other statistical systems ( SPSS, SAS, R) v Backstage data mining techniques => Suspicious records database v Smart reports provide the users(subject matter unit staff) with the reasons v Subject matter unit staff approves if the reason is satisfactory INFORMATION TECHNOLOGIES DEPARTMENT 34
TURKISH STATISTICAL INSTITUTE HARZEMLI ANALYSIS INFORMATION TECHNOLOGIES DEPARTMENT 35
TURKISH STATISTICAL INSTITUTE HARZEMLI ANALYSIS – Data Visualization Visual Analysis module prepared with R software contributes to more effective analysis. SQL sentence generated according to the table selected by the user and column values of this table are sent to R server. INFORMATION TECHNOLOGIES DEPARTMENT 36
TURKISH STATISTICAL INSTITUTE Data Process of MEDAS INFORMATION TECHNOLOGIES DEPARTMENT 37
- Slides: 37