Data Pipeline Workflow Ed Chapman OOI Chief Systems

  • Slides: 35
Download presentation
Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer 1 4/27/2014 Steve Gaul

Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer 1 4/27/2014 Steve Gaul OOI Systems Engineer/Architect 1

Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products

Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 2 4/27/2014 2

By the Numbers • • Number of Deployed Platforms= 89*** Number of Deployed Instruments=

By the Numbers • • Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 • • Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L 1 and 37 L 2) Number of Data Product Types= 203 Number of Unique products= 3928 (L 0, L 1, L 2) Number of Unique L 0 products= 1640 Number of Unique L 1 products= 1533 Number of Unique L 2 products= 755 4/27/2014 3 3 ***As of ECR 1300 -00419 3

Sense and Ingest Data • Several classes of data – – – Instrument samples/profiles

Sense and Ingest Data • Several classes of data – – – Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists • Several data acquisition paths – Live streaming data to shore processing (RSN) – Remote automated collection; data telemetered to shore (CG/EA) • Generally sub-sampled or otherwise simplified – Post recovery data collection from recovered platforms – Physical samples processed post cruise/recovery – Manual collection of logs, photos, etc. ; associated via metadata 4 4/27/2014 4

Physical Flow Sensing and Ingest 5 4/27/2014 5

Physical Flow Sensing and Ingest 5 4/27/2014 5

6 4/27/2014 6

6 4/27/2014 6

System Hardware Summary 7 4/27/2014 7

System Hardware Summary 7 4/27/2014 7

Generic Data Flows 8 4/27/2014 8

Generic Data Flows 8 4/27/2014 8

Functional Flow Sensing and Ingest 9 4/27/2014 9

Functional Flow Sensing and Ingest 9 4/27/2014 9

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop

Instrument Driver and Agent Permanent storage User Ingest for Instruments Including Tier 1 and

Instrument Driver and Agent Permanent storage User Ingest for Instruments Including Tier 1 and HD video 11 4/27/2014 11

Engineering System Driver and Agent Permanent storage User Ingest for Engineering data 12 4/27/2014

Engineering System Driver and Agent Permanent storage User Ingest for Engineering data 12 4/27/2014 12

Ingest for other items • Cruise documents • Algorithms 13 4/27/2014 13

Ingest for other items • Cruise documents • Algorithms 13 4/27/2014 13

Calibration 14 4/27/2014 14

Calibration 14 4/27/2014 14

Instrument Driver and Agent L 0 Permanent storage User Uncalibrated Raw Instrument Data 15

Instrument Driver and Agent L 0 Permanent storage User Uncalibrated Raw Instrument Data 15 4/27/2014 15

Instrument Driver and Agent Permanent storage L 1 a Calibration Values User Internally Calibrated

Instrument Driver and Agent Permanent storage L 1 a Calibration Values User Internally Calibrated Raw Instrument Data 16 4/27/2014 16

Instrument Driver and Agent Calibration Values L 0 Permanent storage L 1 a Data

Instrument Driver and Agent Calibration Values L 0 Permanent storage L 1 a Data Product Algorithm User Primary Calibration of Uncalibrated data 17 4/27/2014 17

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a User L 1 b(Post Deployment) Secondary calibration 18 4/27/2014 18

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a L 1 b(PD) User Secondary Post-Recovery calibration values POLYVAL Algorithm L 1 b(PR) L 1 b(Intrp) Interpolation Secondary calibration 19 4/27/2014 19

Calibration actions • PS or Marine Operator creates Primary Calibration Values • PS or

Calibration actions • PS or Marine Operator creates Primary Calibration Values • PS or Marine Operator creates Secondary Post. Deployment Calibration Values • PS or Marine Operator creates Secondary Post-Recovery Calibration Values • Values are uploaded through the UI as csv files Calibration Values are associated with a specific instrument for a specific period of time 20 4/27/2014 20

Calibration Updates • If new values are uploaded for any of the three, the

Calibration Updates • If new values are uploaded for any of the three, the new values overwrite the prior values. • Assumption is we will only upload new values if there was a mistake with the old ones. We don’t want to allow errors to propagate so we delete the old values 21 4/27/2014 21

Ingest for “etc” Is there anything you want to know about? 22 4/27/2014 22

Ingest for “etc” Is there anything you want to know about? 22 4/27/2014 22

Versioning 23 4/27/2014 23

Versioning 23 4/27/2014 23

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop

Storage 25 4/27/2014 25

Storage 25 4/27/2014 25

Storage Intent-- for most instruments all science and engineering data is retained in OOI

Storage Intent-- for most instruments all science and engineering data is retained in OOI storage for the life of the program. (external archiving will be covered in a later presentation) Planned on an order of magnitude difference between-1. video camera (1) 2. Hydrophones (11), still cameras (10), seismometers (13) 3. Everything else (779) 26 4/27/2014 26

Data Volume Per Year 138 And Seismometer HD Video Kept for 27 4/27/2014 27

Data Volume Per Year 138 And Seismometer HD Video Kept for 27 4/27/2014 27 60 27

Balancing intent & cost HD Video Camera L 2 -SR-RQ-3402 – “Buffering for not

Balancing intent & cost HD Video Camera L 2 -SR-RQ-3402 – “Buffering for not less than six months of all video imagery shall be provided” NSF approved Data Use Policy (DCN 1102 -00010)-instrument Type and Information 28 Minimum period of time for OOI storage before data and data products are moved to long term storage/ national archives HD Cameras 30 days Broadband Hydrophones, Still Cameras, Seismometers, Low Frequency Hydrophones 60 days All other Instruments 90 days Documentation and algorithms Life of program 4/27/2014 28

Data Product Delivery 29 4/27/2014 29

Data Product Delivery 29 4/27/2014 29

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop

Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary

Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary Calibration Function L 1 a Secondary Calibration Functions L 2 b L 1 b QC Algorithms Human In The Loop L 1 a L 1 b and QC flags L 1 c Human In The Loop L 0 GUI User L 2 c QC flags L 2 b

Output Data Product Variables • Single L 1 data product, with the following variables

Output Data Product Variables • Single L 1 data product, with the following variables (i. e. , columns in the time series): – – – – <measurement>_L 1 a (e. g. , Conductivity_L 1 a) <measurement>_L 1 b_Post_Deployment_Cal <measurement>_L 1 b_Post_Recovery_Cal <measurement>_L 1 b_Interpolated <measurement>_L 1 c QC_Flag_Global. Range QC_Flag_Local. Range <additional QC flags> • Single L 2 data product, similar to above • Single “Parsed”(Combined) product per instrument, with all variables for applicable L 1 and L 2 products, additional time stamps, and other variables. 32 4/27/2014 32

Output Data Product Metadata • In the metadata (i. e. , ‘Metadata’ link from

Output Data Product Metadata • In the metadata (i. e. , ‘Metadata’ link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): – Calibration coefficients (as a comma separated list) – QC Look Up Table (as a url, or possibly as values in a TBD format) – Data Product Algorithm (as a url) – DPS for Data Product Algorithm (as a url) – QC Algorithms (as urls) – DPS’s for QC Algorithms (as urls) – POLYVAL Algorithm (as a url) 33 4/27/2014 33

OOI is about getting data to the users! Must maintain a balance between data

OOI is about getting data to the users! Must maintain a balance between data quality, data quantity, and budget. 34 4/27/2014 34

Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video,

Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 35 4/27/2014 35