Dissemination and use of aggregate data structures and

  • Slides: 14
Download presentation
Dissemination and use of aggregate data: structures and functionality Andrew Westlake Survey & Statistical

Dissemination and use of aggregate data: structures and functionality Andrew Westlake Survey & Statistical Computing ssc@count. com www. sasc. co. uk 11/22/2020 Meta-data & Functionality 1

Aggregate data: structures and functionality ] What are the objectives ± ± ] Developments

Aggregate data: structures and functionality ] What are the objectives ± ± ] Developments on the Database side ± ± ] Systems to support the preparation, processing and dissemination of statistics in the form of aggregated data Appropriate tool set Automation of production processes Dynamic access and ‘analysis’ Statistical Database proposals from Computer Science Commercial development of Data Warehouses (OLAP) Requirements ± ± 11/22/2020 Structure Functionality - Manipulation, Dissemination Meta-data & Functionality 2

Processing Aggregate Data 11/22/2020 Meta-data & Functionality 3

Processing Aggregate Data 11/22/2020 Meta-data & Functionality 3

Aggregated Results, as Multi-way Table Period Measures { Year Week Reports received Population at

Aggregated Results, as Multi-way Table Period Measures { Year Week Reports received Population at risk Estimated Incidence rate SD of Incidence rate Month Region Country Location District Day Detail Minor Group Major Group Disease Classification (ICD) 11/22/2020 Meta-data & Functionality This example has three dimensions (so that it can be visualised). In reality, for this application, we would need at least two more, Age and Gender. 4

Statistical Databases ] SSDBM conferences, from early ‘ 80 s STORM model, Rafanelli &

Statistical Databases ] SSDBM conferences, from early ‘ 80 s STORM model, Rafanelli & Shoshani, ‘ 90 ± Summarizability, Lenz & Shoshani, ‘ 97 ± National Statistical Offices ] Research Projects, particularly Eurostat ] ± Idaresa, Addsia, Rainbow, IMIM Concern for concepts, structure, rules, validity ] No Money ] 11/22/2020 Meta-data & Functionality 5

Commercial developments ] Data Warehouse ± ± ± ] OLAP (On-Line Analytical Programming) ±

Commercial developments ] Data Warehouse ± ± ± ] OLAP (On-Line Analytical Programming) ± ± ] DB with Emphasis on performance with fixed data, no transactional requirements Star schema for multi-way tables, Data Cubes Products from main stream DB vendors, and specialists Term invented by Codd Emphasis on exploration of aggregate structure, selection of subgroups, change focus between detail and broad groups Lots of Money ± Products • DB Vendors, e. g. Oracle Express, Pivot tables in MS Excel 2000, Informix Red Brick • Specialists, e. g. Beyond 20/20, Super-Star ± 11/22/2020 Standardisation proposals Meta-data & Functionality 6

Aggregation Functionality ] Store information with minimal aggregation ± ± ] ] Algebra for

Aggregation Functionality ] Store information with minimal aggregation ± ± ] ] Algebra for aggregating classifications and measures is basically straight forward Aggregation of Measures ± ± ± ] Maximum detail in classifications Further aggregation (to less detail) on demand (may pre-compute for efficiency) Everything based on summation can be regrouped (cf. updating algorithms, sufficient statistics) Some others, e. g Range Special issues for time, aggregate or cross sectional measures All aggregated tables are proper tables 11/22/2020 Meta-data & Functionality 8

Manipulation Functionality - for Processing ] Manipulation of Measures ± ± ± ] Combination

Manipulation Functionality - for Processing ] Manipulation of Measures ± ± ± ] Combination of two tables ± ± ] Introduce measures from other tables with similar structure Derive measures within cells Not all combinations are meaningful Find common dimensions and classifications (may require some aggregation or mapping) Choose one table as the detail table Aggregate all non-common dimensions out of the 2 nd table Transfer measures from 2 nd table, repeating values over missing classifications Meta-data to control validity of operations 11/22/2020 Meta-data & Functionality 9

Rules for proper table structure ] Table ± ± ] Classification ± ± ]

Rules for proper table structure ] Table ± ± ] Classification ± ± ] Categories must be exclusive and exhaustive w. r. t. the base population Cannot have its own selection rule (but might have a residual category) Measure ± ] Well-defined base population from which measures are computed May include a selection rule w. r. t. a wider population May have a selection rule (e. g. count with a property) Care is sometimes needed to distinguish between classifications and measures 11/22/2020 Meta-data & Functionality 10

Confusion between classification and measure ] Wrong Subject classification is not exclusive if students

Confusion between classification and measure ] Wrong Subject classification is not exclusive if students can register for more than one course ] Correct Counts selected by subject are different measures 11/22/2020 Meta-data & Functionality 11

Presentation Functionality ] Layout ± ] Improper table combinations ± ± ] Combination of

Presentation Functionality ] Layout ± ] Improper table combinations ± ± ] Combination of dissimilar dimensions e. g. Age groups by (SEG + Housing) Distinction between Classification and Measure is less important for presentation Medium ± ± ± ] Mapping from dimensions to Rows, Columns, Pages Paper, Web, often with analysis (commentary) Machine readable (take away, not linked) Dynamic, for local or remote manipulaton Associated material ± 11/22/2020 Generation of descriptions, footnotes, indexes, content lists Meta-data & Functionality 12

Manipulation Functionality - for Exploration ] ] Dynamic viewing, linked to source aggregations Selection

Manipulation Functionality - for Exploration ] ] Dynamic viewing, linked to source aggregations Selection ± ] Dynamic regrouping ± ± ] ] Subset of classification cells, and of measures Roll up to combine existing groups to next level Drill down to get more detail in groups at lower level Operate independently, i. e. not all parts of a classification at the same level User-defined groupings All derivation and presentation facilities Specialist browsers, available for local data or over the Internet 11/22/2020 Meta-data & Functionality 13

Discovery through Meta-data ] Generic descriptions ± ] Specific topics ± ± ] Population,

Discovery through Meta-data ] Generic descriptions ± ] Specific topics ± ± ] Population, Classifications, Measures linked to concept definitions for searching Formal definitions of standard components selection rules, standard classifications, measure types Specific descriptions of substantive content source variable definitions, questionnaire structure, etc. Accessibility ± 11/22/2020 Information must be available to search engines and user Meta-data & Functionality 14

Conclusions ] ] Good analysis of structural and functionality requirements can produce good products

Conclusions ] ] Good analysis of structural and functionality requirements can produce good products for automated and individual use Further academic work on structures and functionality needed Commercial products are useful but lack many obvious features - we should demand more Commercially driven standards concentrate on basic functionality and overlook statistical and practical validity - we should get more involved 11/22/2020 Meta-data & Functionality 15