Data Pipeline Workflow Ed Chapman OOI Chief Systems
- Slides: 35
Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer 1 4/27/2014 Steve Gaul OOI Systems Engineer/Architect 1
Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 2 4/27/2014 2
By the Numbers • • Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 • • Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L 1 and 37 L 2) Number of Data Product Types= 203 Number of Unique products= 3928 (L 0, L 1, L 2) Number of Unique L 0 products= 1640 Number of Unique L 1 products= 1533 Number of Unique L 2 products= 755 4/27/2014 3 3 ***As of ECR 1300 -00419 3
Sense and Ingest Data • Several classes of data – – – Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists • Several data acquisition paths – Live streaming data to shore processing (RSN) – Remote automated collection; data telemetered to shore (CG/EA) • Generally sub-sampled or otherwise simplified – Post recovery data collection from recovered platforms – Physical samples processed post cruise/recovery – Manual collection of logs, photos, etc. ; associated via metadata 4 4/27/2014 4
Physical Flow Sensing and Ingest 5 4/27/2014 5
6 4/27/2014 6
System Hardware Summary 7 4/27/2014 7
Generic Data Flows 8 4/27/2014 8
Functional Flow Sensing and Ingest 9 4/27/2014 9
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop
Instrument Driver and Agent Permanent storage User Ingest for Instruments Including Tier 1 and HD video 11 4/27/2014 11
Engineering System Driver and Agent Permanent storage User Ingest for Engineering data 12 4/27/2014 12
Ingest for other items • Cruise documents • Algorithms 13 4/27/2014 13
Calibration 14 4/27/2014 14
Instrument Driver and Agent L 0 Permanent storage User Uncalibrated Raw Instrument Data 15 4/27/2014 15
Instrument Driver and Agent Permanent storage L 1 a Calibration Values User Internally Calibrated Raw Instrument Data 16 4/27/2014 16
Instrument Driver and Agent Calibration Values L 0 Permanent storage L 1 a Data Product Algorithm User Primary Calibration of Uncalibrated data 17 4/27/2014 17
Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a User L 1 b(Post Deployment) Secondary calibration 18 4/27/2014 18
Instrument Driver and Agent L 0 Permanent storage Calibration Values Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm L 1 a L 1 b(PD) User Secondary Post-Recovery calibration values POLYVAL Algorithm L 1 b(PR) L 1 b(Intrp) Interpolation Secondary calibration 19 4/27/2014 19
Calibration actions • PS or Marine Operator creates Primary Calibration Values • PS or Marine Operator creates Secondary Post. Deployment Calibration Values • PS or Marine Operator creates Secondary Post-Recovery Calibration Values • Values are uploaded through the UI as csv files Calibration Values are associated with a specific instrument for a specific period of time 20 4/27/2014 20
Calibration Updates • If new values are uploaded for any of the three, the new values overwrite the prior values. • Assumption is we will only upload new values if there was a mistake with the old ones. We don’t want to allow errors to propagate so we delete the old values 21 4/27/2014 21
Ingest for “etc” Is there anything you want to know about? 22 4/27/2014 22
Versioning 23 4/27/2014 23
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop
Storage 25 4/27/2014 25
Storage Intent-- for most instruments all science and engineering data is retained in OOI storage for the life of the program. (external archiving will be covered in a later presentation) Planned on an order of magnitude difference between-1. video camera (1) 2. Hydrophones (11), still cameras (10), seismometers (13) 3. Everything else (779) 26 4/27/2014 26
Data Volume Per Year 138 And Seismometer HD Video Kept for 27 4/27/2014 27 60 27
Balancing intent & cost HD Video Camera L 2 -SR-RQ-3402 – “Buffering for not less than six months of all video imagery shall be provided” NSF approved Data Use Policy (DCN 1102 -00010)-instrument Type and Information 28 Minimum period of time for OOI storage before data and data products are moved to long term storage/ national archives HD Cameras 30 days Broadband Hydrophones, Still Cameras, Seismometers, Low Frequency Hydrophones 60 days All other Instruments 90 days Documentation and algorithms Life of program 4/27/2014 28
Data Product Delivery 29 4/27/2014 29
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop
Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary Calibration Function L 1 a Secondary Calibration Functions L 2 b L 1 b QC Algorithms Human In The Loop L 1 a L 1 b and QC flags L 1 c Human In The Loop L 0 GUI User L 2 c QC flags L 2 b
Output Data Product Variables • Single L 1 data product, with the following variables (i. e. , columns in the time series): – – – – <measurement>_L 1 a (e. g. , Conductivity_L 1 a) <measurement>_L 1 b_Post_Deployment_Cal <measurement>_L 1 b_Post_Recovery_Cal <measurement>_L 1 b_Interpolated <measurement>_L 1 c QC_Flag_Global. Range QC_Flag_Local. Range <additional QC flags> • Single L 2 data product, similar to above • Single “Parsed”(Combined) product per instrument, with all variables for applicable L 1 and L 2 products, additional time stamps, and other variables. 32 4/27/2014 32
Output Data Product Metadata • In the metadata (i. e. , ‘Metadata’ link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): – Calibration coefficients (as a comma separated list) – QC Look Up Table (as a url, or possibly as values in a TBD format) – Data Product Algorithm (as a url) – DPS for Data Product Algorithm (as a url) – QC Algorithms (as urls) – DPS’s for QC Algorithms (as urls) – POLYVAL Algorithm (as a url) 33 4/27/2014 33
OOI is about getting data to the users! Must maintain a balance between data quality, data quantity, and budget. 34 4/27/2014 34
Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery” 35 4/27/2014 35
- Ooi data explorer
- Ooi data access
- Chapter 9 lesson 3 commander in chief and chief diplomat
- Reservation table in pipeline
- Paralleism
- Vroulike vorm van varkbeer
- Ooi beng chin
- Ooi wei tsang
- Ooi phaik yee
- Casing spacers and end seals
- Api rp 1173
- Master data services workflow
- Analysis workflow
- Chief data officer training
- Ibm chief data officer
- Telecom mergers
- Netflix data pipeline
- Ibm infosphere virtual data pipeline
- Data pipeline optimization
- Cde data pipeline
- Data pipeline framework
- Data hazard pipeline
- Data wrangling pipeline
- Aws simple icons
- Chapman university student business services
- Terreno di chapman
- Simone aria
- Chapman reflex points
- On first looking into chapman's homer 해석
- Viscerosomatic levels
- Homogeneous markov chain
- Mary jane kelly
- Frank chapman centre
- Microsoft pdg
- Pdg microsoft
- Chapman student business services