Environmental Data Archival Practices and Benefits Graham Parton
Environmental Data Archival: Practices and Benefits Graham Parton graham. parton@stfc. ac. uk Royal Meteorological Society SIG Meeting, BAS, 5 th October 2011: Transmission, presentation and archiving of meteorological data VO Sandpit, November 2009
Overview What is data archival Why do it? How do we do it within CEDA? VO Sandpit, November 2009
What do we call “data archival” Placing data into a repository which is: • • Backed up Robust (identify data corruptions) Catalogued Recognised repository VO Sandpit, November 2009
Why archive data • Making data public - Openness of the result and repeatability are essential for scientific rigor • Place to share data with project participants • Re-purposing data • Additional services (often for free!) • Maybe required for legal reasons • Secure • Get credit And because if you don’t…. VO Sandpit, November 2009
Why archive data VO Sandpit, November 2009
Scale of CEDA operations >100, 000 files holding ~ 1 Pb of data ~38, 000 files downloaded since October 2010 19, 000+ register users of which ~3600 are currently ‘active’ users 250+ datasets 26 staff Responsible for + other services and projects (e. g. UKCIP, CMIP 5 partner) … i. e. . We are highly reliant on scripted systems and a well structured archive VO Sandpit, November 2009
External discovery service Party Data providers me tad ata 3 rd Arrivals Ingest Web service Archive Backup VO Sandpit, November 2009 discovery view download External Users Data Suppliers Catalogue
Data Preparation Data Suppliers 3 rd Party Data providers Arrivals Ingest Archive VO Sandpit, November 2009
Data Preparation • Data Management Plans including delivery schedules • Conditions of Use/Licensing • Support suppliers in data preparation • Capture supporting documentation (formats, calibration information, flight logs, etc. ) • File naming and archive structure • Set up ingest routes VO Sandpit, November 2009
Data Preparation - File structure Take the bad data challenge…. File “sw 010203” 4. 31 4. 35 4. 31 5. 42 4. 65 4. 79 4. 42 5. 19 5. 38 4. 27 155. 3 146. 5 143. 3 148. 5 152. 3 144. 1 142. 5 141. 6 141. 5 150. 8 3. 92 4. 58 4. 92 4. 31 4. 60 4. 58 4. 40 5. 88 4. 69 136. 1 138. 0 157. 0 140. 4 168. 8 147. 5 133. 4 142. 4 144. 8 138. 8 5. 15 4. 83 4. 94 4. 04 3. 79 5. 33 4. 35 4. 10 6. 00 5. 71 140. 2 153. 7 141. 7 146. 7 145. 3 150. 1 150. 5 152. 6 140. 1 144. 0 4. 23 5. 40 4. 65 3. 92 5. 92 4. 81 4. 96 5. 02 4. 75 5. 21 137. 1 145. 8 143. 1 151. 5 152. 9 141. 0 149. 8 134. 0 158. 3 138. 8 4. 75 4. 63 5. 02 6. 02 5. 56 4. 94 5. 08 5. 00 150. 2 141. 0 143. 0 135. 3 145. 8 146. 9 143. 4 142. 9 148. 1 132. 4 4. 71 4. 90 4. 88 5. 06 4. 77 4. 38 5. 08 5. 27 5. 46 5. 06 What are these data? Guess surface winds, but on what day? What are the units? Any convention? How do we read the file? Is this spatial or temporal data? . . . 1440 pairs of data in a file VO Sandpit, November 2009 137. 3 149. 5 151. 6 161. 6 149. 0 148. 5 144. 4 163. 5 144. 4
Supported Formats Highly structured metadata Standard Names VO Sandpit, November 2009
External discovery service Data Discovery me tad ata 3 rd Party Data providers Arrivals Ingest Archive Web service Archive VO Sandpit, November 2009 discovery External Users Data Suppliers Catalogue
CEDA Catalogue VO Sandpit, November 2009
NERC Data Discovery Service data-search. nerc. ac. uk VO Sandpit, November 2009
CEDA Document Repository cedadocs. badc. rl. ac. uk VO Sandpit, November 2009
Citations for Data Creators: DOIs Citation (and DOI) Data Citation and DOI… but only if in a recognised repository VO Sandpit, November 2009
External discovery service Party Data providers me tad ata 3 rd Arrivals Ingest Archive Web service Archive discovery External Users Data Suppliers Catalogue view download Data Services VO Sandpit, November 2009
Visualisation Services VO Sandpit, November 2009
Visualisation Services ISIC Video Wall VO Sandpit, November 2009
Visualisation Services VO Sandpit, November 2009
Processing Services CEDA WPS: ceda-wps 2. badc. rl. ac. uk/ui/home Chain services together Job either Download run straight resultaway Or sent to run on backend service VO Sandpit, November 2009
Processing Services Trajectory Service VO Sandpit, November 2009
OPe. NDAP Service With security layer • Navigable and scriptable interface to archive • CEDA has applied security shell using “Open ID” technology • Give powerful sub-setting service for large datasets VO Sandpit, November 2009
What’s on the horizon? Continue to develop visualisation and data processing services Increasing data volumes becoming too large to move around Hosting services – provide virtual environments for people to work on the data without downloading From Petascale to Exoscale But all this NEEDS well data that uses standards driven metadata and formats VO Sandpit, November 2009
Take Home Messages • Plan for data management • Tap into standards when preparing data • Get data catalogued for data discovery • Data in supported repositories leads to recognition for efforts preparing data • A suite of additional services add value to existing data Team Digial Preservation Video VO Sandpit, November 2009
- Slides: 25