What is Net CDF And what are its

  • Slides: 52
Download presentation
What is Net. CDF ? And what are its plans for world domination? John

What is Net. CDF ? And what are its plans for world domination? John Caron Unidata August 2009

Net. CDF is…. • • • A file format A library An Application Programmer’s

Net. CDF is…. • • • A file format A library An Application Programmer’s Interface (API) A data model A dessert topping A floor wax

In the beginning Application Perl Matlab API Fortran Ruby API C++ IDL Python API

In the beginning Application Perl Matlab API Fortran Ruby API C++ IDL Python API C API API net. CDF C library / API net. CDF file

Things got more complicated Application API API net. CDF C library / API net.

Things got more complicated Application API API net. CDF C library / API net. CDF-4 file net. CDF-3 file Java Perl Matlab API Fortran Ruby API C++ Python API C net. CDF Java library / API OPe. NDAP data

But wait, there’s more! Scientific Feature Types Application Datatype Adapter Net. CDF-Java/ Netcdf. Dataset

But wait, there’s more! Scientific Feature Types Application Datatype Adapter Net. CDF-Java/ Netcdf. Dataset CDM architecture Coord. System Builder Netcdf. File THREDDS I/O service provider OPe. NDAP Catalog. xml Nc. ML Net. CDF-3 NIDS Net. CDF-4 GRIB HDF 4 GINI Nexrad … DMSP

Netcdf-Java 4. 0 File Formats • General: Net. CDF-3, Net. CDF-4, HDF 5, HDF

Netcdf-Java 4. 0 File Formats • General: Net. CDF-3, Net. CDF-4, HDF 5, HDF 4, OPe. NDAP • Gridded: GRIB-1, GRIB-2, GEMPAK, Mc. IDAS, UAMIV CAMx • • Point: BUFR, GEMPAK Radar: NEXRAD 2&3, DORADE, CINRAD, UF Satellite: DMSP, GINI, Mc. IDAS, FYSAT Misc: GTOPO, NLDN, USPLN, etc

What is net. CDF ? Application API API Java Perl Matlab API Fortran Ruby

What is net. CDF ? Application API API Java Perl Matlab API Fortran Ruby API C++ Python API C net. CDF C library net. CDF Java library Netcdf. File I/O service provider OPe. NDAP Net. CDF-3 NIDS Net. CDF-4 net. CDF-4 file net. CDF-3 file OPe. NDAP data GRIB HDF 4 GINI Nexrad … DMSP

Net. CDF is a… File format Software library API • Store data model objects

Net. CDF is a… File format Software library API • Store data model objects • Persistence layer • Net. CDF-3, net. CDF-4 • Implements the API • C, Java, others An API is the interface to the Data Model for a specific programming language An Abstract Data Model describes data objects and what methods you can use on them

Net. CDF is a… File format • Stores the objects in the data model

Net. CDF is a… File format • Stores the objects in the data model • Persistence layer • Net. CDF-3, net. CDF-4

What you should know about Storage Formats • Locality, locality • I/O cost is

What you should know about Storage Formats • Locality, locality • I/O cost is measured in # disk accesses – Entire block is read at once – Sequential access is 100 x faster than random • Many factors that affect this – Local disk, NFS mounted (shared), server RAID – The disk is caching sectors – The File System / OS is caching pages – Library may be caching data • Applications can try to optimize file layout – write, read, common access patterns – Only matters for large I/O-bound apps

Net. CDF-3 file format Header Variable 1 Non-record Variable 2 float var 1(z, y,

Net. CDF-3 file format Header Variable 1 Non-record Variable 2 float var 1(z, y, x) Row-major order Variable 3 … Record 0 Record Variables Record 1 unlimited… float rvar 1(0, z, y, x) float rvar 2(0, z, y, x) float rvar 3(0, z, y, x) float rvar 1(1, z, y, x) float rvar 2(1, z, y, x) float rvar 3(1, z, y, x)

Net. CDF-4 file format • Built on HDF-5 • Much more complicated than net.

Net. CDF-4 file format • Built on HDF-5 • Much more complicated than net. CDF-3 • Storage efficiency – Compression : can optimize chunking for common I/O pattern – Compound types

Row vs Column storage • Netcdf-3 is a column store – All data for

Row vs Column storage • Netcdf-3 is a column store – All data for one variable is stored together • Traditional RDBMS is a row store – All fields for one row in a table are stored together • Netcdf-4 allows both row and column store – Row: compound type – Column: regular variable • Recent RDBMS research focusing on possible advantages with column oriented storage

Net. CDF is a… Software library • Implements the API • C, Java, third-party

Net. CDF is a… Software library • Implements the API • C, Java, third-party

Net. CDF Libraries • Net. CDF C library – reference implementation – Read/write net.

Net. CDF Libraries • Net. CDF C library – reference implementation – Read/write net. CDF-3 and net. CDF-4 – Read OPe. NDAP (alpha) • Net. CDF Java Library – exploratory – 100% Java == portable – Read net. CDF-3, net. CDF-4, OPe. NDAP, many others – Only writes net. CDF-3 (considering a JNI interface to C library for writing net. CDF-4) – Thread safe, good for servers, used by the THREDDS Data Server (TDS)

What you should know about Multicore CPUs • Commodity CPU’s wont get faster –

What you should know about Multicore CPUs • Commodity CPU’s wont get faster – too hot! Lifecycle cost dominated by electricity $$$ • Moores Law -> multiple CPUs on chip • Multithreaded programs can take advantage of new multicore computer architecture • Good for servers, harder for client programs to take advantage of this • New languages (eventually)

Net. CDF is a… API An API is the interface to the Data Model

Net. CDF is a… API An API is the interface to the Data Model for a specific programming language An Abstract Data Model describes data objects and what methods you can use on them

Net. CDF APIs • Application Programmers Interface – Its what you have to deal

Net. CDF APIs • Application Programmers Interface – Its what you have to deal with – Changing this breaks your code • Lots of language bindings, same data model An API is the interface to the Data Model for a specific programming language

Net. CDF-3 data model • Multidimensional arrays of primitive values – byte, char, short,

Net. CDF-3 data model • Multidimensional arrays of primitive values – byte, char, short, int, float, double • Key/value attributes • Shared dimensions • Fortran 77

Net. CDF-4 Data Model

Net. CDF-4 Data Model

Net. CDF, HDF 5, OPe. NDAP Data Models Shared dimensions Net. CDF (classic) OPe.

Net. CDF, HDF 5, OPe. NDAP Data Models Shared dimensions Net. CDF (classic) OPe. NDAP Net. CDF (extended) HDF 5

Gridded Data • Cartesian coordinates • Data is 2, 3, 4 D • All

Gridded Data • Cartesian coordinates • Data is 2, 3, 4 D • All dimensions have 1 D coordinate variables (separable) float grid. Data(t, z, y, x); float t(t); float y(y); float x(x); float z(z); • net. CDF: coordinate variables • OPe. NDAP: grid map variables • HDF: dimension scales

Swath • two dimensional • track and cross-track • not separate time dimension •

Swath • two dimensional • track and cross-track • not separate time dimension • aka curvilinear coordinates float swath. Data( track, xtrack) float lat(track, xtrack) float lon(track, xtrack) float alt(track, xtrack) float time(track)

Point Observation Data • Set of measurements at the same point in space and

Point Observation Data • Set of measurements at the same point in space and time = obs • Collection of obs = dataset • Sample dimension not connected float obs 1(sample); float obs 2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample);

Shared Dimensions Status • net. CDF – Shared dimension plus conventions is general solution

Shared Dimensions Status • net. CDF – Shared dimension plus conventions is general solution for coordinates – : coordinates = “lat lon alt time” • OPe. NDAP – No shared dimensions in current data model – Shared dimensions will be added to DAP-4 • HDF 5 – No shared dimensions in current data model – HDF-EOS added shared dimensions in metadata – Net. CDF-4 adds a workaround • Net. CDF-4 not a subset of HDF-5 – Net. CDF-4 does not (yet) read all HDF-5 objects • HDF-5 not a subset of Net. CDF-4

Back to API / Data Models Scientific Feature Types Application Datatype Adapter Net. CDF-Java/

Back to API / Data Models Scientific Feature Types Application Datatype Adapter Net. CDF-Java/ Netcdf. Dataset CDM architecture Coord. System Builder Netcdf. File THREDDS I/O service provider OPe. NDAP Catalog. xml Nc. ML Net. CDF-3 NIDS Net. CDF-4 GRIB HDF 4 Data Access GINI Nexrad … DMSP

Net. CDF “Index Space” Data Access: OPe. NDAP URL: http: //motherlode. ucar. edu: 8080/thredds/dods.

Net. CDF “Index Space” Data Access: OPe. NDAP URL: http: //motherlode. ucar. edu: 8080/thredds/dods. C/ NAM_CONUS_80 km_20081028_1200. grib 1. ascii? Precipitable_water[5][5: 1: 30][0: 1: 77] “Coordinate Space” Data Access: NCSS URL: http: //motherlode. ucar. edu: 8080/thredds/ncss/grid/ NAM_CONUS_80 km_20081028_1200. grib 1? var=Precipitable_water& time=2008 -10 -28 T 12: 00 Z& north=40&south=22&west=-110&east=-80

Scientific Feature Types Application Datatype Adapter Nc. ML Net. CDF-Java/ Netcdf. Dataset Coordinate Space

Scientific Feature Types Application Datatype Adapter Nc. ML Net. CDF-Java/ Netcdf. Dataset Coordinate Space Access CDM architecture Coord. System Builder Netcdf. File I/O service provider OPe. NDAP Nc. ML Net. CDF-3 NIDS Net. CDF-4 GRIB HDF 4 Index Space Access GINI Nexrad … DMSP

Coordinate System UML

Coordinate System UML

Netcdf-Java Library parses these Conventions • CF Conventions (preferred) • COARDS, NCAR-CSM, ATD-Radar, Zebra,

Netcdf-Java Library parses these Conventions • CF Conventions (preferred) • COARDS, NCAR-CSM, ATD-Radar, Zebra, GEIF, IRIDL, NUWG, AWIPS, WRF, M 3 IO, IFPS, ADAS/ARPS, MADIS, Epic, RAF-Nimbus, NSSL National Reflectivity Mosaic, Fsl. Wind. Profiler, Modis Satellite, Avhrr Satellite, Cosmic, …. • Write your own Coord. Sys. Builder Java class

Projections (CF) • • • albers_conical_equal_area lambert_azimuthal_equal_area lambert_conformal_conic mcidas_area mercator orthographic rotated_pole stereographic (including

Projections (CF) • • • albers_conical_equal_area lambert_azimuthal_equal_area lambert_conformal_conic mcidas_area mercator orthographic rotated_pole stereographic (including polar) transverse_mercator UTM (ellipsoidal) vertical_perspective

Vertical Transforms (CF) • • • atmosphere_sigma atmosphere_hybrid_sigma_pressure atmosphere_hybrid_height ocean_sigma existing 3 DField

Vertical Transforms (CF) • • • atmosphere_sigma atmosphere_hybrid_sigma_pressure atmosphere_hybrid_height ocean_sigma existing 3 DField

Add your own Transform • Pluggable framework – Add at runtime – Coord. Trans.

Add your own Transform • Pluggable framework – Add at runtime – Coord. Trans. Builder. register. Transform() • Implement Coord. Trans. Builder. IF

Coordinate Systems Summary • How? – Write your own Java code, plug into CDM

Coordinate Systems Summary • How? – Write your own Java code, plug into CDM – Write your files using CF Conventions • Why? – Standard visualization, debugging, and data manipulation tools – Standard servers to make your data remotely accessible

Payoff

Payoff

Net. CDF-Java library • Used as a component in other software (partial) – –

Net. CDF-Java library • Used as a component in other software (partial) – – – – – Integrated Data Viewer, Tools. UI (Unidata) Panoply (NASA) nc. Browse (EPIC/NOAA) Java NEXRAD Viewer (NCDC/NOAA) My. World GIS (Northwestern) EDC for Arc. GIS, ERRDAP (SFSC/NOAA) Live Access Server (PMEL/NOAA) nc. WMS (Reading) Matlab plug-in (USGS)

THREDDS Data Server Servlet Container catalog. xml THREDDS Server • WCS • OPe. NDAP

THREDDS Data Server Servlet Container catalog. xml THREDDS Server • WCS • OPe. NDAP • HTTPServer • WMS Net. CDF-Java library config. Catalog. xml IDD Datasets motherlode. ucar. edu Remote Access Client

THREDDS Data Server (TDS) • Web server for scientific data • From Unidata •

THREDDS Data Server (TDS) • Web server for scientific data • From Unidata • Provides remote data access – OPe. NDAP – Open Geospatial Consortium (OGC) WMS and WCS – HTTP file transfer – Experimental data access protocols.

OGC Web Map Service • Jon Blower’s (Reading, UK) nc. WMS integrated with TDS

OGC Web Map Service • Jon Blower’s (Reading, UK) nc. WMS integrated with TDS • Coordinate Space subsetting • Produces JPEG output • Fast generation of images • Reproject images into large number of coordinate systems

WMS Clients NASA World Wind Cadcorp SIS Google Earth 3 rd-party clients can’t use

WMS Clients NASA World Wind Cadcorp SIS Google Earth 3 rd-party clients can’t use the custom WMS extensions

Web Coverage Service • Coordinate Space subsetting • Return formats – Geo. TIFF floating

Web Coverage Service • Coordinate Space subsetting • Return formats – Geo. TIFF floating point, grayscale – Net. CDF/CF • No reprojections, resamplings • Restricted to CDM files that have Grid coordinate system – evenly spaced x, y

Net. CDF Markup Language (Nc. ML) • XML representation of net. CDF metadata (like

Net. CDF Markup Language (Nc. ML) • XML representation of net. CDF metadata (like ncdump -h) • Create new net. CDF files (like ncgen) • Modify (“fix”) existing datasets without rewriting them • Create virtual datasets as aggregations of multiple existing files. • Integrated with the TDS

Nc. ML Modify and serve through TDS <dataset name=“Polar Orbiter Data" url. Path =“idd/sat/Polar.

Nc. ML Modify and serve through TDS <dataset name=“Polar Orbiter Data" url. Path =“idd/sat/Polar. Data“ > <netcdf location="/data/sat/P 02393. hdf”> <attribute name="Conventions" value="CF-1. 4"/> <variable name="Reflectivity" org. Name=“R 34768”> <attribute name="units" value=“d. BZ" /> <attribute name=“coordinates" value=“time lat lon" /> </variable> </netcdf> </dataset>

TDS / Nc. ML Modify all files in dataset. Scan <dataset. Scan name=“Polar Orbiter"

TDS / Nc. ML Modify all files in dataset. Scan <dataset. Scan name=“Polar Orbiter" path="/data/sat/" location= "/data/hdf/polar/"> <netcdf> <attribute name="Conventions" value="CF-1. 4"/> <variable name="Reflectivity" org. Name=“R 34768”> <attribute name="units" value=“d. BZ" /> <attribute name=“coordinates" value=“time lat lon" /> </variable> </netcdf> </dataset. Scan>

TDS / Nc. ML aggregation <dataset name="WEST-CONUS_4 km Aggregation" url. Path="satellite/3. 9/WESTCONUS_4 km"> <netcdf>

TDS / Nc. ML aggregation <dataset name="WEST-CONUS_4 km Aggregation" url. Path="satellite/3. 9/WESTCONUS_4 km"> <netcdf> <aggregation dim. Name="time" type="join. Existing"> <scan location="/data/satellite/WEST-CONUS_4 km/" suffix=". gini" /> </aggregation> </netcdf> </dataset>

Conclusions • Net. CDF is a floor wax and a dessert topping • A

Conclusions • Net. CDF is a floor wax and a dessert topping • A data model is a good way to see the forest through the trees • We now have a useable merger of net. CDF, OPe. NDAP, HDF 5 technologies • Add Coordinate information to allow “coordinate space subsetting” – Nc. ML/TDS can help – But the right way to do this is….

Conclusion • • • I will use CF Conventions I will use CF Conventions

Conclusion • • • I will use CF Conventions I will use CF Conventions I will use CF Conventions

Net. CDF-Java Common Data Model (Data Access Layer)

Net. CDF-Java Common Data Model (Data Access Layer)