ESO Archive Adam Dobrzycki for the ESO Archive
ESO Archive Adam Dobrzycki# for the ESO Archive Team # “Door-Bridge-Ski” or “Dope-risky”
Plan Who? Teams behind the Archive Why? What the Archive is for What? Archive content How? Ingest, retrieval, data standards Odds and ends Future ESO Archive 6 November 2017
Administrative stuff ESO Archive is overseen by Archive Content Handling Group (ACH) in Back-End Operations Department (BOD) in Data Management and Operations Division (DMO) in Directorate of Operations (Do. O) ACH (head: Nathalie Fourniol) Ø Archive Operations (AOG), aog@eso. org Nathalie Fourniol, Michael Boelter, Nicolas Rosse Ø Database Content Management (DBCM), dbcm@eso. org Adam Dobrzycki, Ignacio Vera Sequeiros, Cristiano Da Rocha, My-Ha Debreuck Vuong ESO Archive 6 November 2017
Why does the Archive exist? Because the ESO Council says so… …but primarily because it makes sense. From both technical and scientific points of view. Ø Enables reproduction/verification of scientific claims – the cornerstone of the Scientific Method Ø Plenty of science left in the data: • • Increasing samples (no. of objects, temporal coverage) Serendipitous sources (“one’s garbage is someone else’s food”) Something exploded - what was it? ≳ 25% of papers use archival data; ~15% are archive-only Ø Allows experimenting Ø Helps planning future observations Ø And so on… ESO Archive 6 November 2017
Few numbers Volume (as of 30 Sep): Ø 1. 01 PB Ø 45 Mfiles (39 Mfiles FITS) Ingest: Ø 130 TB/year (of which raw: 75 TB/year) Ø 2. 5 Mfiles/year Deliveries: Ø PIs: 65 TB/year Ø Archive users: 135 TB/year ESO Archive 6 November 2017
Archive content - files FITS files Ø Raw data from all LPO sites (incl. APEX) Ø Master calibrations generated by QC group (not made public directly) Ø IDPs: pipeline processed data from several instruments: XSHOOTER, UVES, PIONIER, MUSE, HAWKI, HARPS, GIRAFFE, FEROS. Next in line: KMOS Ø EDPs: processed data delivered to the Archive by PIs (Public Surveys, GOODS, z. COSMOS), APEX. Incl. mosaics, catalogues, etc. Opslog files, PAFs Other stuff ESO Archive 6 November 2017
Archive content - databases Full FITS header database (32. 5 billion rows) Back-end databases Ø Raw files, master calibrations (not visible externally), products (all flavours) Access database Ø Only files listed there can be queried and/or delivered using archive services ASM database Ø Meteo data, some items from radiometer Full history of opslogs Ø Accessible via Aut. Rep ESO Archive 6 November 2017
From the Andes to the Prealps How are the data transferred and published? Ø FITS header dumps and FITS files are transferred (separately) to staging area in Garching. From there: • FITS headers are stored in keywords repository DB • FITS files are stored in Garching NGAS Ø Every five minutes: • “Interesting” metadata from newly arrived headers are put in archive query DBs and in access DB • When both header and file are accounted for, the frame is published. Note: science, calibrations and acquisitions only! Ø It is !@#$%^&* quick! ESO Archive 6 November 2017
Retrieving data 1 It’s one thing to put stuff into the Archive. It’s another thing to get it out. Querying: Ø Science users: numerous web forms to search for sci, acq and cal data (general, instrument-specific, data products, comm/SV, etc. ), archive. eso. org Ø ESO staff, instrument teams, etc. can access proprietary/test/technical data via: archive. eso. org/wdb/forms/cas/eso_archive_main. html Instructions/procedures: check Sci. Ops wiki page This works for FITS and associated files only! No logs, etc. ESO Archive 6 November 2017
Retrieving data 2 Downloading: Ø Query forms allow requesting selected data Ø Direct retrieval: if archive IDs known, enter them in: archive. eso. org/cms/eso-data-direct-retrieval. html In all cases access credentials will be verified and download will be allowed or denied accordingly Again: FITS (and associated files) only! Other types of data: ask archive@eso. org. Provide all details: which data, why you need it, who you are, etc. ESO Archive 6 November 2017
Retrieving data 3 “Programmatic” access, i. e. querying/downloading with scripts: Ø Possible, but not openly advertised Ø If interested, check FAQ or contact archive@eso. org Ø Usual access restrictions apply Beefed up version is one of the new “Services” in the Archive Services Project (more later). ESO Archive 6 November 2017
Out-of-ordinary 1 Something happened. What to do? Ø If this is about calibration files: • Mark them as BAD QUALITY on the respective cal. Checker_<instrument> page. If not viable… • …open PROP ticket with qc_<instrument>. Ø Otherwise, open a PROP ticket with DBCM, as described in Sci. Ops wiki page Ø What can be done in the Archive? • If the data are already in the archive: – Metadata (headers) can be modified/updated to reflect reality; modifications will be propagated to delivered data – Frames can be hidden, i. e. made invisible to queries – Data can be embargoed (and then usually released) • If the data are not in the archive: – No rule, this is case-by-case by the very nature ESO Archive 6 November 2017
Out-of-ordinary 2 Important things to keep in mind when dealing with out-of-ordinary stuff: Ø Archiving support in Garching is limited to business hours Mon-Fri Ø Header keyword manipulations (esp. adding keywords) are costly. Esp. large volumes. Ø There are only superficial structure/content checks on the mountain; the file has to be really corrupted to not make it to the archive. Ø But: only data that comply with DICB rules (next slide) will be made visible. ESO Archive 6 November 2017
Data Interface Control (DIC) © DIC Entertainment LLC ESO Archive 6 November 2017
Data Interface Control (DIC) DIC Board (DICB) defines data/metadata standards Ø LPO Representatives in the Board: Alain and Pedro DIC Document: ESO-044156 Ø Current: Version 6, released 21 June 2016 Ø Mandatory FITS keywords, allowed values Ø Hierarchical keyword categories Ø DID, PAF and log file formats Ø Syntax for compound units, date/time Ø File name and identifier conventions ESO Archive 6 November 2017
Minimum DICB standard In principle: Ø For technical access: • meaningful OLAS_ID (env. variable on IWS) • INS. ID • MJD-OBS Ø For scientific access: • • • meaningful OLAS_ID (env. variable on IWS) INS. ID MJD-OBS DPR. CATG one of SCIENCE, CALIB, ACQUISITION Legal OBS. PROG. ID (i. e. known to front-end DB) BUT: Ø Good luck finding those files if that’s all they have! ESO Archive 6 November 2017
Odds and ends 1 Searching based on keywords: Ø On query forms, keyword name is usually displayed. If not there, then probably not harvested Ø One can display/download just the header (if you know the identifier): archive. eso. org/hdr? Dp. Id=<dp_id> e. g. archive. eso. org/hdr? Dp. Id=HAWKI. 2017 -08 -18 T 00: 29: 14. 920 Ø Neat trick on how to use this for searching for specific keyword(s) is described in the Sci. Ops wiki page ESO Archive 6 November 2017
Odds and ends 2 FITS problems seen in the past (and present. . . ): Ø PCOUNT+GCOUNT in primary HDU Ø Data-describing keywords (BUNIT, BSCALE, BZERO, DATAMIN, DATAMAX) no longer valid or applicable, sometimes even illegal Ø WCS keywords in empty primary HDU of multi-extension files Ø CDn_m and CDELTi keywords in the same file, sometimes conflicting Ø Na. N or Inf: legal as data values, but illegal as keyword values Ø Some keywords must be “fixed format” (i. e. flushed right to the 30 th column) Ø Lowercase “e” in exponents (1. 0 e-10 is a FITS error) ESO Archive 6 November 2017
Future More data, more space, more bandwidth, etc. Archive Services Project Ø Alberto Micol’s talk a week ago Ø Interactive user experience: • Blending of query, result and download • Previews, zooming in and out Ø Programmatic access • Enables complex queries • Connecting raw data, products, ambient info Ø First release estimate: Q 1 2018 ESO Archive 6 November 2017
Thank you! adobrzyc@eso. org dicb@eso. org
- Slides: 20