Feature types as an integration bridge in the
‘Feature types’ as an integration bridge in the climate sciences • Andrew Woolf (1, *), Bryan Lawrence (2), Jeremy Tandy (3), Keiran Millard (4), Dominic Lowe (2), Sam Pepler (2) • (1) CCLRC e-Science Centre, (2) British Atmospheric Data Centre, (3) Met Office, (4) HR Wallingford (*) Corresponding author email: A. Woolf@rl. ac. uk • AGU 2006, San Francisco IN 53 C-02
Outline • Background – ‘container’ vs ‘content’ – BADC – feature types • The data management pipeline – – ingestion integration management use • Examples – CSML – Observations and Measurements 2 AGU 2006, San Francisco IN 53 C-02
Background: container vs content • Storage-centred data management focuses on container, not content – different stovepipes for different storage – granularity impacts entire pipeline – backend exposed throughout – integration difficult – maintenance complexity 3 AGU 2006, San Francisco IN 53 C-02
Background: e. g. BADC • British Atmospheric Data Centre – http: //badc. nerc. ac. uk – UK NERC designed data centre – ~60 Tb, ~130 datasets – NERC programmes, Met Office, ECMWF, NASA, . . . – ground-based observation networks, model output (NWP, climate), satellite data 4 AGU 2006, San Francisco IN 53 C-02
Background: e. g. BADC 5 AGU 2006, San Francisco IN 53 C-02
Background: e. g. BADC 6 AGU 2006, San Francisco IN 53 C-02
Background: e. g. BADC • Nearly all the data at the BADC has geospatial information • But it is not represented in a standard way • Lots of types of geospatial and temporal things with no clear categorisation 7 AGU 2006, San Francisco IN 53 C-02
Background: e. g. BADC • The current way of doing things makes it hard to integrate data from other data repositories… • …, or other datasets… • …, or even data from within the same dataset sometimes! 8 AGU 2006, San Francisco IN 53 C-02
Background: ‘feature types’ • Emerging ISO standards – TC 211 – around 40 standards for geographic information – Cover activity spectrum: discovery access use …in a defined logical structure… …and described by metadata. …delivered through services… A geospatial dataset… …consists of features and related objects… 9 ISO 19101 Domain Reference Model AGU 2006, San Francisco IN 53 C-02
Background: ‘feature types’ • Geographic ‘features’ – “abstraction of real world phenomena” [ISO 19101] – Type or instance – Encapsulate important semantics in universe of discourse • Application schema – Defines semantic content and logical structure of datasets – ISO standards provide toolkit: • • spatial/temporal referencing geometry (1 -, 2 -, 3 -D) topology dictionaries (phenomena, units, etc. ) – GML – canonical encoding 10 [from ISO 19109 “Geographic information – Rules for Application Schema”] AGU 2006, San Francisco IN 53 C-02
Background: ‘feature types’ • “lifetime of a technical implementation is shorter than the lifetime of the information it handles” (CEN/TR 15449) • Loosens coupling between storage artefacts and data management infrastructure: – breaks the link between storage and discovery/access – front-end can expose information rather than files – entire infrastructure more independent of back-end 11 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: ingestion • “What’s a dataset ? ” • BADC currently: “A collection of files with a common theme and administration” • Alternative: “A collection of feature instances with a common theme and administration” – better for integration – more natural granularity for use – independent of physical storage format 12 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: integration • e. g. UK NERC Data. Grid British Atmospheric Data Centre Simulation s 13 British Oceanographic Data Centre Assimilation AGU 2006, San Francisco IN 53 C-02
Data management pipeline: integration • ‘Feature types’ provide integration key – common language across providers/users – e. g. oceanographers / meteorologists share discussion about semantics of data despite format differences • Standard mechanism for ‘relating’ data – ‘association’ is part of General Feature Model – (rather than determined by file/directory structures) 14 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: management • How to manage preservation/curation of storage artefacts ? cf. OAIS (ISO 14721): • A ‘features view’ redirects the emphasis to preserving the feature rather than the file – e. g. become less hung-up on GRIB net. CDF conversion – object-with-attributes is the curation focus 15 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: use • Currently, have to ‘back out’ information content – ‘features’ make this explicit – enables standard patterns for ‘context’, e. g. OGC Observations and Measurements • ‘Features’ are closer to applications – can be leveraged for value-added services – General Feature Model/UML ‘operations’ – (Work needed on implementation!) 16 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: use • Visualisation – generic visualisation capability fraught! – feature types make this more explicit • Discovery – ‘feature collections’ more natural granularity than file/directory collections or database tables 17 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: use • Mediator architecture – n+m, not n*m ! ‘Feature types’ view 18 AGU 2006, San Francisco IN 53 C-02
Data management pipeline: use • Integrates climate science data within mainstream ‘spatial data infrastructure’ – e. g. EU INSPIRE Directive – enhances cross-disciplinary use 19 AGU 2006, San Francisco IN 53 C-02
Examples • Climate Science Modelling Language (CSML) Profile. Feature Ragged. Section. Feature Scanning. Radar. Feature Profile. Series. Feature Grid. Feature 20 AGU 2006, San Francisco IN 53 C-02
Examples • OGC ‘Observations and Measurements’ CSML 21 An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure AGU 2006, San Francisco IN 53 C-02
Summary • Data management problems arise from traditional ‘storage-oriented’ view • ‘Feature types’ encapsulate information semantics – Provides integration key across granularity range • Potential benefits for entire data management pipeline – ingestion integration management use 22 AGU 2006, San Francisco IN 53 C-02
- Slides: 22