The HDF Group NASA HDFHDFEOS Data Access Challenges

  • Slides: 36
Download presentation
The HDF Group NASA HDF/HDF-EOS Data Access Challenges H. Joe Lee (hyokee@hdfgroup. org) Kent

The HDF Group NASA HDF/HDF-EOS Data Access Challenges H. Joe Lee (hyokee@hdfgroup. org) Kent Yang (myang 6@hdfgroup. org) The HDF Group July 9, 2013 ESIP 2013 Summer Meeting 1 www. hdfgroup. org

Hal Varian, Google’s chief economist “The ability to take data – to be able

Hal Varian, Google’s chief economist “The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades. ” July 9, 2013 ESIP 2013 Summer Meeting 2 www. hdfgroup. org

For Earth Science Data Users The ability to take NASA HDF/HDF-EOS data – to

For Earth Science Data Users The ability to take NASA HDF/HDF-EOS data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s a hugely important skill right now. July 9, 2013 ESIP 2013 Summer Meeting 3 www. hdfgroup. org

Is it easy to take NASA HDF data? No, for Average Joe data user.

Is it easy to take NASA HDF data? No, for Average Joe data user. July 9, 2013 ESIP 2013 Summer Meeting 4 www. hdfgroup. org

Understand “I'm new to IDL and HDF; and I'm currently working with MODIS L

Understand “I'm new to IDL and HDF; and I'm currently working with MODIS L 1 B data. I found your examples very helpful. Is it possible to show radiance is calculated? ” July 9, 2013 ESIP 2013 Summer Meeting 5 www. hdfgroup. org

Process “I work in NASA/GSFC GES-DISC on AIRS project. We have new idl version

Process “I work in NASA/GSFC GES-DISC on AIRS project. We have new idl version 8. 1. But got a core dump error when we run EOS function swath name from a AIRS level 2 product file. Need your EOS_SW_INQSWATH to inqure help. Thanks. ” July 9, 2013 ESIP 2013 Summer Meeting 6 www. hdfgroup. org

Extract Values TRMM data, “Hi, I want to use the following http: //mirador. gsfc.

Extract Values TRMM data, “Hi, I want to use the following http: //mirador. gsfc. nasa. gov/. . . 2 A 25. . . . Can you provide me some programs that deal with these datasets so that I can obtain the daily convective precipitation in the region 110 -180 E, 0 -40 N during 2006? ” July 9, 2013 ESIP 2013 Summer Meeting 7 www. hdfgroup. org

Visualize matlab file for reading ozone hdf 5 files obtained from mls available “Can

Visualize matlab file for reading ozone hdf 5 files obtained from mls available “Can you please make the to the public. I wanted to obtain ozone distribution over the world and ozone distributions with height etc. thank you : ) …. oh can you tell me which function can i use to plot latitude in the x-axis, pressure in the y-axis and a contour plot of ozone over it? ” July 9, 2013 ESIP 2013 Summer Meeting 8 www. hdfgroup. org

Communicate “Your prog is very helpful to verify my process. I have one more

Communicate “Your prog is very helpful to verify my process. I have one more doubt. I am trying to convert this hdf to Geotiff using Matlab. Do have any written code to do the same. Doing it with HEG tool given an error specifying that 5 D are only supported for SOM projections. Also I am doing all processing with Matlab. So could you pl. help me. ” July 9, 2013 ESIP 2013 Summer Meeting 9 www. hdfgroup. org

NASA HDF Users See Challenges in accessing satellite-product-specific (MODIS, AIRS, MLS) geo-location/time-specific (lat/lon/height/year) their

NASA HDF Users See Challenges in accessing satellite-product-specific (MODIS, AIRS, MLS) geo-location/time-specific (lat/lon/height/year) their favorite software data with packages (MATLAB/IDL/Arc. GIS). July 9, 2013 ESIP 2013 Summer Meeting 10 www. hdfgroup. org

What Makes Access Challenging? 1. Some files use the techniques that end users may

What Makes Access Challenging? 1. Some files use the techniques that end users may not be familiar with, although the techniques may help storing data efficiently. 2. Information from a source outside the files is required to retrieve the data in a physically meaningful manner. 3. Attributes do not comply with the widely used conventions. 4. Metadata in HDF file has incorrect information. July 9, 2013 ESIP 2013 Summer Meeting 11 www. hdfgroup. org

Converted File Size Comparison 656 M Netcdf-3 128 M Netcdf-4 72 M HDF-EOS 2

Converted File Size Comparison 656 M Netcdf-3 128 M Netcdf-4 72 M HDF-EOS 2 July 9, 2013 ESIP 2013 Summer Meeting 9 X 12 www. hdfgroup. org

Challenge 1: Unfamiliar Techniques Users look for Latitude/Longitude datasets that match variable (e. g.

Challenge 1: Unfamiliar Techniques Users look for Latitude/Longitude datasets that match variable (e. g. , Ozone) datasets. Some HDF products have • mismatched lat/lon. • lat/lon information in metadata attribute. • duplicate lat/lon information. July 9, 2013 ESIP 2013 Summer Meeting 13 www. hdfgroup. org

Swath Dimension Map Example HDF-EOS Swath Dimension Map allows to have mismatched size in

Swath Dimension Map Example HDF-EOS Swath Dimension Map allows to have mismatched size in dimensions. • Latitude[512] • Longitude[512] • Data[1024] July 9, 2013 ESIP 2013 Summer Meeting 14 www. hdfgroup. org

NSIDC AMSR_E NCL Example ; Read the file as HDF 4 file to obtain

NSIDC AMSR_E NCL Example ; Read the file as HDF 4 file to obtain dataset attributes. hdf 4_file = addfile("AMSR_E_L 3_Weekly. Ocean_V 03_20020616. hdf", "r") ; Read the file as HDF-EO 2 file to obtain lat and lon. hdf-eos 2_file = addfile("AMSR_E_L 3_Weekly. Ocean_V 03_20020616. hdf. he 2" User should call both HDF 4 and HDF-EOS 2 API: • HDF 4 API alone cannot resolve lat/lon. • HDF-EOS 2 API alone cannot retrieve some attributes that are added later by HDF 4 APIs. July 9, 2013 ESIP 2013 Summer Meeting 15 www. hdfgroup. org

Challenge 2: Information Outside HDF Users must read data product manual to find •

Challenge 2: Information Outside HDF Users must read data product manual to find • fill value / valid ranges • units or discrete key values • scale / offset equation • physical description of data Some products are not self-describing! July 9, 2013 ESIP 2013 Summer Meeting 16 www. hdfgroup. org

Without Information Outside HDF July 9, 2013 ESIP 2013 Summer Meeting 17 www. hdfgroup.

Without Information Outside HDF July 9, 2013 ESIP 2013 Summer Meeting 17 www. hdfgroup. org

With Information Outside HDF July 9, 2013 ESIP 2013 Summer Meeting 18 www. hdfgroup.

With Information Outside HDF July 9, 2013 ESIP 2013 Summer Meeting 18 www. hdfgroup. org

Challenge 3: The CF Conventions Following the widely accepted CF conventions is important for

Challenge 3: The CF Conventions Following the widely accepted CF conventions is important for interoperability but some HDF products • use non-alphanumeric characters. • use non-CF attribute names and values. • use non-CF scale / offset rules. • use different data type for attribute (e. g. , _Fill. Value) from the variable. July 9, 2013 ESIP 2013 Summer Meeting 19 www. hdfgroup. org

Attribute Type Mismatch Example Int 16 data[180][360] // Variable String valid_range “ 0, 100”

Attribute Type Mismatch Example Int 16 data[180][360] // Variable String valid_range “ 0, 100” // Attribute (Wrong) Byte _Fill. Value 255 // Attribute (Wrong) Int 16 data[180][360] // Variable Int 16 valid_range 0, 100 // Attribute (Correct) Int 16 _Fill. Value 255 // Attribute (Correct) July 9, 2013 ESIP 2013 Summer Meeting 20 www. hdfgroup. org

Challenge 4: Incorrect Information Sometimes, metadata contains incorrect information. This is rare and such

Challenge 4: Incorrect Information Sometimes, metadata contains incorrect information. This is rare and such information is usually corrected immediately by data producers. July 9, 2013 ESIP 2013 Summer Meeting 21 www. hdfgroup. org

Incorrect Information Example An NCL user reported that the same code doesn’t work for

Incorrect Information Example An NCL user reported that the same code doesn’t work for an older MOP 02 HDF-EOS 5 file. In 2008/01/01 file, Struct. Metadata has the wrong value: n. Time = 250841130416 In 2008/12/31 file, Struct. Metadata has the correct value: n. Time= 2 La. RC ASDC fixed this already! July 9, 2013 ESIP 2013 Summer Meeting 22 www. hdfgroup. org

Good News The recent effort from The HDF Group overcomes many challenges: • HDF

Good News The recent effort from The HDF Group overcomes many challenges: • HDF 4/HDF 5 OPe. NDAP Handler with Enable. CF option • H 4 CF Conversion Toolkit with Nc. ML / NCO examples • HDF-EOS 5 Augmentation Tool • HDF-EOS 2 Dumper tool with Comprehensive Examples for MATLAB/IDL/NCL The above tools and their examples are available at HDFEOS. org. July 9, 2013 ESIP 2013 Summer Meeting 23 www. hdfgroup. org

Challenge 1: Unfamiliar Techniques HDF OPe. NDAP handlers & H 4 CF Conversion Toolkit

Challenge 1: Unfamiliar Techniques HDF OPe. NDAP handlers & H 4 CF Conversion Toolkit • provide full geo-location information as explicit datasets. HDF-EOS 5 Augmentation Tool • provides ways to associate geo-location information with existing datasets or to supply new ones. HDF-EOS 2 Dumper Tool • prints out geo-location information in ASCII because MATLAB/IDL/NCL can read ASCII text data. July 9, 2013 ESIP 2013 Summer Meeting 24 www. hdfgroup. org

Challenge 2: Information Outside HDF OPe. NDAP handlers • provide fill value / valid

Challenge 2: Information Outside HDF OPe. NDAP handlers • provide fill value / valid range information. • apply CF scale / offset rule. • calculate latitude and longitude values for some NASA non-EOS products. • are tested against ncml_handler so that data centers can additional information using Nc. ML. H 4 CF Conversion Toolkit (h 4 tonccf) • provides Nc. ML and NCO examples to add or edit attributes for converted Net. CDF files. July 9, 2013 ESIP 2013 Summer Meeting 25 www. hdfgroup. org

Challenge 3: The CF Conventions HDF OPe. NDAP handlers & H 4 CF Conversion

Challenge 3: The CF Conventions HDF OPe. NDAP handlers & H 4 CF Conversion Toolkit • flatten group hierarchies. • change variable & attribute types, names, and values. • add named dimensions. • add coordinate information. July 9, 2013 ESIP 2013 Summer Meeting 26 www. hdfgroup. org

Challenge 4: Incorrect Information HDF OPe. NDAP handlers & H 4 CF Conversion Toolkit

Challenge 4: Incorrect Information HDF OPe. NDAP handlers & H 4 CF Conversion Toolkit • correct errors for old products temporarily. • catch errors for new products. July 9, 2013 ESIP 2013 Summer Meeting 27 www. hdfgroup. org

Better News We see less and less challenges in newer HDF products thanks to

Better News We see less and less challenges in newer HDF products thanks to open communication and standardization effort among Earth Science communities through meetings, telecons, and mailing lists. • HDF – DAACs Telecons • ESDSWG – H 5 CF Conventions • ESIP • CF (satellite) conventions mailing lists July 9, 2013 ESIP 2013 Summer Meeting 28 www. hdfgroup. org

Future Challenges • Data Discovery • Subsetting and Aggregation • Sharing Research Data July

Future Challenges • Data Discovery • Subsetting and Aggregation • Sharing Research Data July 9, 2013 ESIP 2013 Summer Meeting 29 www. hdfgroup. org

Data Discovery Some users still don’t know how to search and where to download

Data Discovery Some users still don’t know how to search and where to download data. Spatial search in Reverb doesn’t guarantee that the matched HDF data files contain the valid values at the specific location that user is looking for. Browse image is helpful but users don’t want to examine one by one. July 9, 2013 ESIP 2013 Summer Meeting 30 www. hdfgroup. org

Reverb Browse Image for O 3 at Seoul The returned HDF file has no

Reverb Browse Image for O 3 at Seoul The returned HDF file has no value at Seoul July 9, 2013 ESIP 2013 Summer Meeting 31 www. hdfgroup. org

Subsetting and Aggregation Customized on-demand HDF product generation is desired based on the user’s

Subsetting and Aggregation Customized on-demand HDF product generation is desired based on the user’s query. For example, “Give me all L 2 Ozone data at Seoul from 2002 to 2013 and allow me to download it as a single HDF file. ” Most HDF data products are packaged in daily granule for large region. Search result returns thousands of HDF files and users cannot download them one by one. July 9, 2013 ESIP 2013 Summer Meeting 32 www. hdfgroup. org

Reverb Query Result for AIRS at Seoul Showing 1 to 9 of 5, 047

Reverb Query Result for AIRS at Seoul Showing 1 to 9 of 5, 047 granules July 9, 2013 ESIP 2013 Summer Meeting 33 www. hdfgroup. org

Sharing Research Data How can users easily compose and publish new research data from

Sharing Research Data How can users easily compose and publish new research data from the different NASA data product sources? “I’d like to combine AIRS Ozone and OMI Ozone data at Seoul from 2002 -2013 and share it with journal editors. ” Can this be shared as a single URL query to NASA data cloud? July 9, 2013 ESIP 2013 Summer Meeting 34 www. hdfgroup. org

Thanks! Questions / Comments? eoshelp@hdfgroup. org July 9, 2013 ESIP 2013 Summer Meeting 35

Thanks! Questions / Comments? eoshelp@hdfgroup. org July 9, 2013 ESIP 2013 Summer Meeting 35 www. hdfgroup. org

Acknowledgements This work was supported by Subcontract number 114820 under Raytheon Contract number NNG

Acknowledgements This work was supported by Subcontract number 114820 under Raytheon Contract number NNG 10 HP 02 C, funded by the National Aeronautics and Space Administration (NASA) and by cooperative agreement number NNX 08 AO 77 A from the NASA. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Raytheon or the National Aeronautics and Space Administration. July 9, 2013 ESIP 2013 Summer Meeting 36 www. hdfgroup. org