Data Pipeline Workflow Ed Chapman OOI Chief Systems

  • Slides: 32
Download presentation
Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer 4/25/2014 Steve Gaul OOI

Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer 4/25/2014 Steve Gaul OOI Systems Engineer/Architect 1

Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products

Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 4/25/2014 2

By the Numbers • • Number of Deployed Platforms= 89*** Number of Deployed Instruments=

By the Numbers • • Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 • • Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L 1 and 37 L 2) Number of Data Product Types= 203 Number of Unique products= 3928 (L 0, L 1, L 2) Number of Unique L 0 products= 1640 Number of Unique L 1 products= 1533 Number of Unique L 2 products= 755 4/25/2014 3 ***As of ECR 1300 -00419 3

Sense and Ingest Data • Several classes of data – – – Instrument samples/profiles

Sense and Ingest Data • Several classes of data – – – Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists • Several data acquisition paths – Live streaming data to shore processing (RSN) – Remote automated collection; data telemetered to shore (CG/EA) • Generally sub-sampled or otherwise simplified – Post recovery data collection from recovered platforms – Physical samples processed post cruise/recovery – Manual collection of logs, photos, etc. ; associated via metadata 4 4/25/2014 4

Physical Flow Sensing and Ingest 4/25/2014 5

Physical Flow Sensing and Ingest 4/25/2014 5

System Hardware Summary 6 4/25/2014 6

System Hardware Summary 6 4/25/2014 6

Generic Data Flows 7 4/25/2014 7

Generic Data Flows 7 4/25/2014 7

Functional Flow Sensing and Ingest 4/25/2014 8

Functional Flow Sensing and Ingest 4/25/2014 8

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop

Instrument Driver and Agent Permanent storage User Ingest for Instruments Including Tier 1 and

Instrument Driver and Agent Permanent storage User Ingest for Instruments Including Tier 1 and HD video 4/25/2014 10

Engineering System Driver and Agent Permanent storage User Ingest for Engineering data 4/25/2014 11

Engineering System Driver and Agent Permanent storage User Ingest for Engineering data 4/25/2014 11

Ingest for other items • Cruise documents • Algorithms 4/25/2014 12

Ingest for other items • Cruise documents • Algorithms 4/25/2014 12

Ingest flow for Calibration 4/25/2014 13

Ingest flow for Calibration 4/25/2014 13

Instrument Driver and Agent Raw Permanent storage L 0 Permanent storage User 4/25/2014 14

Instrument Driver and Agent Raw Permanent storage L 0 Permanent storage User 4/25/2014 14

Instrument Driver and Agent Permanent storage L 1 a Calibration Values User 4/25/2014 15

Instrument Driver and Agent Permanent storage L 1 a Calibration Values User 4/25/2014 15

Instrument Driver and Agent Calibration Values L 0 Permanent storage L 1 a Data

Instrument Driver and Agent Calibration Values L 0 Permanent storage L 1 a Data Product Algorithm User 4/25/2014 16

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a User L 1 b(Post Deployment) 4/25/2014 17

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary

Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a L 1 b(PD) User Secondary Post-Recovery calibration values POLYVAL Algorithm L 1 b(PR) L 1 b(Intrp) Interpolation 4/25/2014 18

When? • Primary Calibration is applied whenever someone or something asks for something that

When? • Primary Calibration is applied whenever someone or something asks for something that requires it. • Secondary Calibration is applied whenever someone or something asks for something that requires it • L 1 and L 2 products are produced on demand. 4/25/2014 19

Calibration actions • Someone (PS or Marine Operator) creates Primary Calibration Values • Someone

Calibration actions • Someone (PS or Marine Operator) creates Primary Calibration Values • Someone (PS or Marine Operator) creates Secondary Post-Deployment Calibration Values • Someone (PS or Marine Operator) creates Secondary Post-Recovery Calibration Values • Values are uploaded through the UI as csv files (exact format, content, and UI dialog are TBD) Calibration Values are associated with a specific instrument for a specific period of time 4/25/2014 20

Calibration Updates • If new values are uploaded for any of the three, the

Calibration Updates • If new values are uploaded for any of the three, the new values overwrite the prior values. • Assumption is we will only upload new values if there was a mistake with the old ones. We don’t want to allow errors to propagate so we delete the old values 4/25/2014 21

Ingest for “etc” Anything you want to know about? 4/25/2014 22

Ingest for “etc” Anything you want to know about? 4/25/2014 22

Versioning 4/25/2014 23

Versioning 4/25/2014 23

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop

Storage 4/25/2014 25

Storage 4/25/2014 25

Data Volume Per Year 4/25/2014 26 26

Data Volume Per Year 4/25/2014 26 26

Data Product Delivery 4/25/2014 27

Data Product Delivery 4/25/2014 27

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration

Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop

Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary

Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary Calibration Function L 1 a Secondary Calibration Functions L 2 b L 1 b QC Algorithms Human In The Loop L 1 a L 1 b and QC flags L 1 c Human In The Loop L 0 GUI User L 2 c QC flags L 2 b

Output Data Product Variables • Single L 1 data product, with the following variables

Output Data Product Variables • Single L 1 data product, with the following variables (i. e. , columns in the time series): – – – – <measurement>_L 1 a (e. g. , Conductivity_L 1 a) <measurement>_L 1 b_Post_Deployment_Cal <measurement>_L 1 b_Post_Recovery_Cal <measurement>_L 1 b_Interpolated <measurement>_L 1 c QC_Flag_Global. Range QC_Flag_Local. Range <additional QC flags> • Single L 2 data product, similar to above • Single “Parsed”(Combined) product per instrument, with all variables for applicable L 1 and L 2 products, additional time stamps, and maybe other stuff. 4/25/2014 30

Output Data Product Metadata • In the metadata (i. e. , ‘Metadata’ link from

Output Data Product Metadata • In the metadata (i. e. , ‘Metadata’ link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): – Calibration coefficients (as a comma separated list) – QC Look Up Table (as a url, or possibly as values in a TBD format) – Data Product Algorithm (as a url) – DPS for Data Product Algorithm (as a url) – QC Algorithms (as urls) – DPS’s for QC Algorithms (as urls) – POLYVAL Algorithm (as a url) 4/25/2014 31

Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video,

Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 4/25/2014 32