Data Pipeline Workflow Ed Chapman OOI Chief Systems
- Slides: 32
Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer 4/25/2014 Steve Gaul OOI Systems Engineer/Architect 1
Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 4/25/2014 2
By the Numbers • • Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 • • Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L 1 and 37 L 2) Number of Data Product Types= 203 Number of Unique products= 3928 (L 0, L 1, L 2) Number of Unique L 0 products= 1640 Number of Unique L 1 products= 1533 Number of Unique L 2 products= 755 4/25/2014 3 ***As of ECR 1300 -00419 3
Sense and Ingest Data • Several classes of data – – – Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists • Several data acquisition paths – Live streaming data to shore processing (RSN) – Remote automated collection; data telemetered to shore (CG/EA) • Generally sub-sampled or otherwise simplified – Post recovery data collection from recovered platforms – Physical samples processed post cruise/recovery – Manual collection of logs, photos, etc. ; associated via metadata 4 4/25/2014 4
Physical Flow Sensing and Ingest 4/25/2014 5
System Hardware Summary 6 4/25/2014 6
Generic Data Flows 7 4/25/2014 7
Functional Flow Sensing and Ingest 4/25/2014 8
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop
Instrument Driver and Agent Permanent storage User Ingest for Instruments Including Tier 1 and HD video 4/25/2014 10
Engineering System Driver and Agent Permanent storage User Ingest for Engineering data 4/25/2014 11
Ingest for other items • Cruise documents • Algorithms 4/25/2014 12
Ingest flow for Calibration 4/25/2014 13
Instrument Driver and Agent Raw Permanent storage L 0 Permanent storage User 4/25/2014 14
Instrument Driver and Agent Permanent storage L 1 a Calibration Values User 4/25/2014 15
Instrument Driver and Agent Calibration Values L 0 Permanent storage L 1 a Data Product Algorithm User 4/25/2014 16
Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a User L 1 b(Post Deployment) 4/25/2014 17
Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a L 1 b(PD) User Secondary Post-Recovery calibration values POLYVAL Algorithm L 1 b(PR) L 1 b(Intrp) Interpolation 4/25/2014 18
When? • Primary Calibration is applied whenever someone or something asks for something that requires it. • Secondary Calibration is applied whenever someone or something asks for something that requires it • L 1 and L 2 products are produced on demand. 4/25/2014 19
Calibration actions • Someone (PS or Marine Operator) creates Primary Calibration Values • Someone (PS or Marine Operator) creates Secondary Post-Deployment Calibration Values • Someone (PS or Marine Operator) creates Secondary Post-Recovery Calibration Values • Values are uploaded through the UI as csv files (exact format, content, and UI dialog are TBD) Calibration Values are associated with a specific instrument for a specific period of time 4/25/2014 20
Calibration Updates • If new values are uploaded for any of the three, the new values overwrite the prior values. • Assumption is we will only upload new values if there was a mistake with the old ones. We don’t want to allow errors to propagate so we delete the old values 4/25/2014 21
Ingest for “etc” Anything you want to know about? 4/25/2014 22
Versioning 4/25/2014 23
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop
Storage 4/25/2014 25
Data Volume Per Year 4/25/2014 26 26
Data Product Delivery 4/25/2014 27
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop
Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary Calibration Function L 1 a Secondary Calibration Functions L 2 b L 1 b QC Algorithms Human In The Loop L 1 a L 1 b and QC flags L 1 c Human In The Loop L 0 GUI User L 2 c QC flags L 2 b
Output Data Product Variables • Single L 1 data product, with the following variables (i. e. , columns in the time series): – – – – <measurement>_L 1 a (e. g. , Conductivity_L 1 a) <measurement>_L 1 b_Post_Deployment_Cal <measurement>_L 1 b_Post_Recovery_Cal <measurement>_L 1 b_Interpolated <measurement>_L 1 c QC_Flag_Global. Range QC_Flag_Local. Range <additional QC flags> • Single L 2 data product, similar to above • Single “Parsed”(Combined) product per instrument, with all variables for applicable L 1 and L 2 products, additional time stamps, and maybe other stuff. 4/25/2014 30
Output Data Product Metadata • In the metadata (i. e. , ‘Metadata’ link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): – Calibration coefficients (as a comma separated list) – QC Look Up Table (as a url, or possibly as values in a TBD format) – Data Product Algorithm (as a url) – DPS for Data Product Algorithm (as a url) – QC Algorithms (as urls) – DPS’s for QC Algorithms (as urls) – POLYVAL Algorithm (as a url) 4/25/2014 31
Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 4/25/2014 32
- Ooi data access
- Ooi data access
- Chapter 9 lesson 3 commander in chief and chief diplomat
- Linear pipelining
- Superscalar pipeline design
- Vroulike vorm van bokram
- Ooi beng chin
- Ooi wei tsang
- Ooi phaik yee
- Cci pipeline systems
- Api 1173
- Master data services workflow
- Analysis workflow
- Chief data officer training
- Ibm chief data officer
- Telecom mergers
- Netflix kafka monitoring
- Infosphere virtual data pipeline
- Data pipeline optimization
- Cde data pipeline
- Data pipeline framework
- Umich eecs 470
- Data wrangling pipeline
- Amazon glacier icon
- Chapman university declining balance
- Terreno di chapman
- Simone aria
- Viscerosomatic reflex
- On first looking into chapman's homer 해석
- Viscerosomatic levels
- Homogeneous markov chain
- Mary jane kelly
- Frank chapman outdoor education centre