Net CDF Data Model Details Russ Rew UCAR

  • Slides: 13
Download presentation
Net. CDF Data Model Details Russ Rew, UCAR Unidata Net. CDF 2009 Workshop 2009

Net. CDF Data Model Details Russ Rew, UCAR Unidata Net. CDF 2009 Workshop 2009 -08 -04

Net. CDF and HDF 5 Data Models n The net. CDF classic data model:

Net. CDF and HDF 5 Data Models n The net. CDF classic data model: simple and flat q q q n The net. CDF enhanced data model added q q n Dimensions Variables Attributes More primitive types Hierarchical groups User-defined datatypes Multiple unlimited dimensions The HDF 5 data model also has q q q Hard- and soft-links (providing multiple names for things) User-defined primitive datatypes References (pointers to objects and data regions in a file) Attributes attached to user-defined types A few other miscellaneous features

The Enhanced Net. CDF Data Model n n Additions to classic net. CDF data

The Enhanced Net. CDF Data Model n n Additions to classic net. CDF data model Still a subset of HDF 5 data model Made possible by adding a few things to HDF 5 so net. CDF could fit within it Criteria for additions to classic model: handling identified classic limitations HDF 5 net. CDF enhanced net. CDF classic

Classic net. CDF data model Variables and attributes use one of six primitive data

Classic net. CDF data model Variables and attributes use one of six primitive data types. File location: Filename create( ), open( ), … Data. Type Primitive. Type Dimension Attribute name: String type: Data. Type values: 1 D array length: int char byte short int float double is. Unlimited( ) Variable name: String shape: Dimension[ ] type: Data. Type array: read( ), … A file has variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One dimension may be of unlimited length.

Variables versus Attributes Characteristics of variables: n n n n For data May be

Variables versus Attributes Characteristics of variables: n n n n For data May be too large for memory May be multidimensional Support partial access Individual values may be changed More data may be appended May have associated attributes Shape specified with shared dimensions Characteristics of attributes: n n n Intended for metadata For single values, strings, or small 1 -D arrays Accessed atomically (written or read all at once) Typically values don’t change after creation May not have attributes Length specified when created

Characteristics of the classic data model n Strengths q q q Simple to explain

Characteristics of the classic data model n Strengths q q q Simple to explain Good for discussing data representation issues Efficient implementation is possible Writing generic applications is practical For gridded data, good data representations available Shared dimensions are useful n Weaknesses q q Multiple variable-length data structures hard to represent Additional conventions required for earth science, e. g. coordinate systems Lacks compound data structures Lacks nested data structures

Enhanced net. CDF data model, for net. CDF-4 Variables and attributes have one of

Enhanced net. CDF data model, for net. CDF-4 Variables and attributes have one of twelve primitive data types or one of four user-defined types. File location: Filename create( ), open( ), … Data. Type Group User. Defined. Type name: String typename: String Dimension Attribute name: String type: Data. Type values: 1 D array length: int Enum Opaque is. Unlimited( ) Compound Variable name: String shape: Dimension[ ] type: Data. Type array: read( ), … Variable. Length Primitive. Type char byte short int 64 float double unsigned byte unsigned short unsigned int 64 string A file has a top-level unnamed group. Each group may contain one or more named subgroups, user-defined types, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length.

Characteristics of the enhanced data model n Strengths q q q Simpler than HDF

Characteristics of the enhanced data model n Strengths q q q Simpler than HDF 5, with similar representational power Completely contains and is backward compatible with classic model Efficient implementation available Fixes identified weaknesses of net. CDF classic model Incremental adoption of model features possible n Potential weaknesses q Writing generic applications more difficult q Types must be defined and named separately from use, even if not shared q No attributes allowed on compound members

Some details of the enhanced data model n No attributes permitted for compound type

Some details of the enhanced data model n No attributes permitted for compound type members (because HDF 5 doesn’t allow such attributes): compound wind_vector_type { float eastward; float northward; } n Inclusion of user-defined opaque types (why not just use variable-length array of bytes? ) Type definitions as first-class objects Type containment in groups, but global scope for use Inheritance through group hierarchy of only dimensions (why not coordinate variables or attributes? ) n n n

Natural convention for assigning attributes to members of a compound types: compound wind_vector_t {

Natural convention for assigning attributes to members of a compound types: compound wind_vector_t { float eastward ; float northward ; } compound wind_vector_units_t { string eastward ; string northward ; } variables: wind_vector_t wind(station) ; wind_vector_units_t wind: units = {"m/s", "m/s"} ;

Enhancing a Data Model with Backward Compatibility n Benefits q q q n Costs

Enhancing a Data Model with Backward Compatibility n Benefits q q q n Costs q q q n Data in archives don’t have to change Client program sources don’t have to change Software can access archived data without being aware of format version Effort required to support older interfaces and formats Can’t easily fix mistake in released interfaces Comprehensive compatibility testing needed Implementation q q q Evolve data model incrementally Add or grow abstractions, instead of modifying or removing them Ensures previous data model is included in enhanced data model

Net. CDF-4 classic-model: a transitional format net. CDF-4 classic model • Not compatible with

Net. CDF-4 classic-model: a transitional format net. CDF-4 classic model • Not compatible with some existing applications • Enhanced data model and API, more complex, powerful • Uses classic API for compatibility • Uses net. CDF-4/HDF 5 storage for compression, chunking, performance • To use, just recompile, relink • Compatible with existing applications • Simplest data model and API net. CDF-3

�Concluding remarks n n Serious use of net. CDF enhanced data model just beginning

�Concluding remarks n n Serious use of net. CDF enhanced data model just beginning Future adjustments to model, if any, will be made by addition, not modification or deletion of existing features q q Preserves previous programming interfaces Supports access to previous format variants transparently