Data Standards at the IRI Data Library M
Data Standards at the IRI Data Library M. Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University http: //iridl. ldeo. columbia. edu/
Current Data Exchange Standards • There are many of them • Some are flexible but semantically weak • Others are semantically specific but not sufficiently flexible We are working on this …
Data Library Overview Specialized Data Tools Maproom Generalized Data Tools Data Viewer IRI Data Collection Data Language Dataset • Variable • ivar multidimensional URL/URI for data, calculations, figs, etc
IRI Data Collection Ocean/Atm “geolocated by lat/lon” multidimensional Economics Public Health spectral harmonics “geolocated by entity” equal-area grids GRIB grid codes climate divisions IRI Data Collection Dataset • Variable • ivar multidimensional GIS “geolocation by vector object or projection metadata”
IRI Data Collection GRIB net. CDF images binary spreadsheets shapefiles Database Tables queries Servers Open. DAP THREDDS IRI Data Collection Dataset • Variable • ivar images w/proj
IRI Data Collection GRIB net. CDF images binary spreadsheets shapefiles Database Tables queries Servers Open. DAP THREDDS images w/proj IRI Data Collection Dataset • Variable • ivar Calculations “virtual variables” Data Files netcdf binary images graphics descriptive and navigational pages Clients Open. DAP THREDDS Tables Open. GIS WMS v 1. 3
Open. DAP: very important to us because we can act as both a client and as a server, and because it is flexible enough to represent all our calculations (“virtual variables”), i. e. a user can specify an analysis and export it. At the moment we cannot read shapefile data using it (and the serving of shapes over Open. DAP is consequently untested), but hopefully that is temporary Impedance mismatch is low
Other Important Standards netcdf GRIB GEOTIFF Shapefiles vs. Post. GIS in Postgres (OGC compliant)
Standards becoming important to us (we think) OGC: GIS Conceptual Framework OGC: WMS, WFS, WCS These are designed to be partial – we will have many datasets/analyses that we cannot transfer using these protocols
Interoperability requires Semantics Currently we have some numeric interoperability, but we have a long ways to go for semantic interoperability
Standard Metadata Schema/Data Services Tools Datasets Users
Many Data Communities Standard Metadata Schema Tools Standard Metadata Schema Datasets Users Standard Metadata Schema Tools Users Datasets Tools Users Standard Metadata Schema Tools Users Datasets
Super Schema Standard metadata schema Standard Metadata Schema Tools Datasets Users Standard Metadata Schema Tools Users Datasets
Super Schema: direct Standard metadata schema/data service Standard Metadata Schema Tools Datasets Users Standard Metadata Schema Tools Users Datasets
Flaws • A lot of work • Super Schema/Service is the Lowest. Common-Denominator • Science keeps evolving, so that standards either fall behind or constantly change
RDF Standard Data Model Exchange Standard metadata schema RDF RDF Standard Metadata Schema Tools Datasets Users RDF Standard Metadata Schema Tools Users Datasets
RDF Data Model Exchange Standard metadata schema RDF RDF Standard Metadata Schema RDF RDF Tools Datasets Users RDF Standard Metadata Schema Users RDF Standard Metadata Schem RDF Tools Datasets RDF Datasets Standard Metadata Schema Tools Users RDF Datasets Tools Users Datasets
RDF Architecture queries Virtual (derived) RDF RDF RDF RDF RDF
Why is this better? • Maps the original dataset metadata into a standard format that can be transported and manipulated • Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but • When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping • Can uses enhanced mappings between models that are close • EASIER – these are tools to enhance the mapping process
Key Features of RDF/OWL Web-based Framework for writing down and interrelating semantic standards Non-contextual Modeling: data object relationships are stated explicitly, not inferred from context Late-Semantic-Binding: semantics do not alter transport/storage, semantic mapping can be added later as scientific fields evolve Not much track record – yet
RDF vs. XML Schema RDF is usually transported as XML So it is XML But it differs from XML Schema in that the Schema is not fixed beforehand XML Schema – a prearranged exchange RDF/XML – add to/query an information space
Sample Tool: Faceted Search http: //iridl. ldeo. columbia. edu/ontologies/query 2. pl? . . .
- Slides: 22