Data Ingest In ERDDAP Sensor data ERDDAP data

  • Slides: 19
Download presentation
Data Ingest In ERDDAP Sensor data ERDDAP data Your Favorite Client Software Bob Simons

Data Ingest In ERDDAP Sensor data ERDDAP data Your Favorite Client Software Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA bob. simons@noaa. gov

ERDDAP's web applications are built on ERDDAP's web services. Web Applications a web page

ERDDAP's web applications are built on ERDDAP's web services. Web Applications a web page with a form (for humans) URL RESTful Web Services a single request URL specifies an entire request (for computers) URL

2 3 1 4

2 3 1 4

Requesting Gridded Data (OPe. N)DAP standard, plus extensions

Requesting Gridded Data (OPe. N)DAP standard, plus extensions

Requesting Tabular (In-situ) Data (OPe. N)DAP standard, plus optional extensions. Very much like SQL's

Requesting Tabular (In-situ) Data (OPe. N)DAP standard, plus optional extensions. Very much like SQL's WHERE.

Tabular Data Request URLs A RESTful URL specifies an entire request: dataset, response file

Tabular Data Request URLs A RESTful URL specifies an entire request: dataset, response file type, subset: http: //coastwatch. pfeg. noaa. gov/erddap/tabledap/pmel. Tao. Dy. Sst. html ? longitude, latitude, T_25, time&time=2011 -08 -10 T 12: 00 Z • Special file types: . html (Data Access Form), . graph (graph form), . fgdc, . iso 19115, . das, . dds, . subset • Data file types: . asc, . csv, . esri. Csv, . html. Table, . geo. Json, . json, . mat, . nc. CF, . nc. CFMA, . odv. Txt, . tsv, . xhtml, . . . • Image file types: . geotif, . kml, . pdf, . png, . transparent. Png

Data Ingest in ERDDAP ● ● ● Data Input (new) + Data Output (existing)

Data Ingest in ERDDAP ● ● ● Data Input (new) + Data Output (existing) For tabular/in situ data only Since ERDDAP v 2. 00 (2019 -06) Goal: As fast as possible (<1 s from sensor to ERDDAP to user) Unique feature: reproducible data

Create the Dataset (Done by the ERDDAP administrator) 1. Make a 1 -line, JSON

Create the Dataset (Done by the ERDDAP administrator) 1. Make a 1 -line, JSON Lines*, CSV, sample data file (see http: //jsonlines. org/examples/ "Better than CSV"): ["station. ID", "time", "latitude", "longitude", "wind. Speed", "wind. Dir", "timestamp", "author", "command"] ["sg 028", "2011 -08 -10 T 18: 15: 00 Z", 0. 0, 0, 0. 0, "Some. Body", 0] 2. As usual, use Generate. Datasets. Xml to create the chunk of XML to describe the EDDTable. From. Http. Get dataset. Special attributes: * <att name="http. Get. Required. Variables">station. ID, time</att> * <att name="http. Get. Directory. Structure">station. ID, time/1 month</att> * <att name="http. Get. Keys">Wind. Logger_password, QAScript_password, John. Smith_password</att> 3. After real data is in the dataset, delete the sample file.

Data Ingest URLs: . insert A single URL specifies the entire change request for

Data Ingest URLs: . insert A single URL specifies the entire change request for one row of data* (same format as when filling out a form on a web page): https: //coastwatch. pfeg. noaa. gov/erddap/tabledap/my. Dataset. ID. insert ? station. ID=sg 028&time=2011 -08 -10 T 18: 15: 00 Z&latitude=32. 4886&longitude=-120. 4135 &wind. Speed=12. 0&wind. Dir=271&author=Wind. Logger_password • • . insert: becomes command=0 (insert) or 1 (delete) in data file Row identifiers: "required variables", e. g. , station. ID, time The data: e. g. , latitude, longitude, wind. Speed, wind. Dir. The author=author_key: Only author is saved in the data file. • • You can specify multiple author_key combinations in datasets. xml. Thus author can be e. g. , the sensor, a QA script, the PI, a grad student, . . . • Submit via HTTPS GET or POST (more secure). • ERDDAP adds timestamp (when ERDDAP processed the change). *This is the same format that a form on a web page uses to submit data. Thanks to the Earth. Cube/Unidata CHORDS project for the basic idea.

Data Ingest URLs: Alternative Format for. insert Or, a single URL can specify multiple

Data Ingest URLs: Alternative Format for. insert Or, a single URL can specify multiple rows of data via arrays: https: //coastwatch. pfeg. noaa. gov/erddap/tabledap/my. Dataset. ID. insert ? station. ID=sg 028 &time=[2011 -08 -10 T 18: 15: 00 Z, 2011 -08 -10 T 18: 20: 00 Z, 2011 -08 -10 T 18: 25: 00 Z] &latitude=32. 4886&longitude=-120. 4135 &wind. Speed=[271, 277, 302]&wind. Dir[8. 9, 10. 2, 9. 5] &author=Wind. Logger_password • The number of values must be the same in all arrays. • Single values are treated as constants. • This is a more efficient way to transfer lots of data.

Changing the Data: . insert or. delete A single URL specifies the entire change

Changing the Data: . insert or. delete A single URL specifies the entire change request for one row of data (same format as when filling out a form on a web page): https: //coastwatch. pfeg. noaa. gov/erddap/tabledap/my. Dataset. ID. insert ? station. ID=sg 028&time=2011 -08 -10 T 18: 15: 00 Z&latitude=32. 4886&longitude=-120. 4135 &wind. Speed=12. 0&wind. Dir=273&author=QAScript_password • . insert or. delete: CRUD (Create, Read, Update, Delete) • The row identifiers: "required variables", e. g. , station. ID, time If the values are the same, the new data "overwrites" the previous data. • The data: e. g. , latitude, longitude, wind. Speed, wind. Dir. • The author=author_key: Only author is saved in the data file. • Author can be e. g. , the sensor, a QA script, the PI, a grad student, . . . • . delete just needs the row identifiers and author. • Submit via HTTPS GET or POST (more secure). • Data is written to a log file. Previous rows not changed.

ERDDAP Response • Success: an JSON response: e. g. , { "status": "success", "n.

ERDDAP Response • Success: an JSON response: e. g. , { "status": "success", "n. Rows. Received": 1, "string. Timestamp": "2018 -11 -05 T 22: 19. 517 Z", "numeric. Timestamp": 1. 541455939517+E 9 } • Failure: an HTTP Error code • E. g. , 403 Forbidden: e. g. , incorrect author_key • Why? Because there can be errors anywhere, e. g. , in network Robust System • If failure, sensor must wait and try again. • A standard computer science solution: build a robust system on top of an error-prone system.

Data Storage • Write to data files which are essentially log files: • JSON

Data Storage • Write to data files which are essentially log files: • JSON Lines CSV (so backend is like EDDTable. From. Jsonl. CSVFiles): • • • standard, Unicode support, easy to read, edit (if corrupted), and backup. Efficient: ERDDAP just appends new data to end of file. No information is ever discarded*: even if QC or human edited. ERDDAP adds timestamp variable: when that info was added to log file. • Store dataset in chunked files (from http. Get. Directory. Structure)*: • station. ID/1 month: e. g. , . . . /sg 028_2018 -03. jsonl Or more levels or bigger or smaller time chunks. So one dataset can handle 1000's of stations for 100's of years of high resolution data, e. g. , platform. ID/10 years/1 day. Efficient retrieval: chunked Query dataset with timestamp<=some. Time*: "Reproducible Data" List changes made to a row*: who made what change, when. • • • *Unique and useful!

Documentation: https: //coastwatch. pfeg. noaa. gov/erddap/download/setup. Datasets. Xml. html#EDDTable. From. Http. Get Ask questions,

Documentation: https: //coastwatch. pfeg. noaa. gov/erddap/download/setup. Datasets. Xml. html#EDDTable. From. Http. Get Ask questions, give me feedback or suggestions: bob. simons@noaa. gov Read about ERDDAP and try it out http: //coastwatch. pfeg. noaa. gov/erddap/ Download and install ERDDAP http: //coastwatch. pfeg. noaa. gov/erddap/download/setup. html Thank you! bob. simons@noaa. gov