SOFIA Archiving Requirements for the SOFIA Data Cycle

  • Slides: 35
Download presentation
SOFIA Archiving Requirements for the SOFIA Data Cycle System Mark Morris, UCLA Joe Mazzarella

SOFIA Archiving Requirements for the SOFIA Data Cycle System Mark Morris, UCLA Joe Mazzarella & Steve Lord, IPAC John Milburn & Jochen Horn, UCLA March 7 -8, 2000 DCS Preliminary Design Review

SOFIA Data Archives • Purposes: – – – Maximize the scientific productivity of SOFIA

SOFIA Data Archives • Purposes: – – – Maximize the scientific productivity of SOFIA Provide public information about existing data Provide data backups Verification of data products Informs related science projects ******* Archival research **** • e. g. , context of the Astrophysics Data Program • Supplement to other funded or unfunded research. – Motivates publication March 7 -8, 2000 DCS Preliminary Design Review 2

SOFIA Data Archives • SOFIA observations to be archived in three forms: – SUMMARY

SOFIA Data Archives • SOFIA observations to be archived in three forms: – SUMMARY ARCHIVE data headers and logs – WORKING ARCHIVE raw data from all instruments – PUBLIC ARCHIVE reduced data from facility instruments March 7 -8, 2000 DCS Preliminary Design Review 3

SOFIA Databases March 7 -8, 2000 DCS Preliminary Design Review 4

SOFIA Databases March 7 -8, 2000 DCS Preliminary Design Review 4

SOFIA Data Archives • SUMMARY ARCHIVE – on-line, equipped with search tool – maintained

SOFIA Data Archives • SUMMARY ARCHIVE – on-line, equipped with search tool – maintained at the SSMOC – headers of each observation, giving: source names, positions instrument & its parameter settings, integration times Important environmental & aircraft parameters. – links to the flight and observing logs – identities of P. I. & observer (if different) – includes proposal abstract March 7 -8, 2000 DCS Preliminary Design Review 5

SOFIA Data Archives • LOGS – Highly automated – Can be annotated – Flight

SOFIA Data Archives • LOGS – Highly automated – Can be annotated – Flight log • Details of observatory functions, flight parameters – Observing log • • • Fundamental set of observing parameters (e. g. , observer ID, source ID, position, instrument mode, frequency, filters, bandwidth, start & stop times, integration times, chop/nod configuration, water vapor index, etc. ) Optional set of custom parameters Includes wrap-up commentary (exit interview) March 7 -8, 2000 DCS Preliminary Design Review 6

SOFIA Data Archives • WORKING ARCHIVE þ purposes: w Fundamental repository of all untreated

SOFIA Data Archives • WORKING ARCHIVE þ purposes: w Fundamental repository of all untreated SOFIA data w Backup w Resource for archival research þ maintained by SSMOC staff þ includes: w Contents of summary archive w Environmental and housekeeping data w Raw science data from all instruments þ made available upon request by qualified individuals with a web -based request form on a SOFIA archive page having links to the data reduction tools. Access requests subject to approval by a person with designated authority. þ access subject to validation period for all but the proposing PI. March 7 -8, 2000 DCS Preliminary Design Review 7

SOFIA Data Archives • PUBLIC ARCHIVE þ created from Working Archive data from (at

SOFIA Data Archives • PUBLIC ARCHIVE þ created from Working Archive data from (at least) the facility instruments which have been carried through a standard data reduction pipeline þ fully accessible on the web, following validation period þ maintained at the SSMOC, mirrored at IPAC þ Consistent in form and function with other mission archives embedded within IRSAat IPAC. þ accompanying tools to examine and extract quantitative information from, the archived images and spectra. Where feasible, existing IRSA tools will be adapted to this end. March 7 -8, 2000 DCS Preliminary Design Review 8

SOFIA Data Archives • METADATA ARCHIVES [recognizing evolution of both software and instrumentation]: –

SOFIA Data Archives • METADATA ARCHIVES [recognizing evolution of both software and instrumentation]: – Pipeline components - version tracking – Pipelines – Documentation • manuals • tutorials March 7 -8, 2000 DCS Preliminary Design Review 9

SOFIA Assumptions (1) The SOFIA Archive Requirements and Design are being developed with the

SOFIA Assumptions (1) The SOFIA Archive Requirements and Design are being developed with the following assumptions: • The primary use is by scientists using the Web. • All components of the archive will reside at NASA Ames, with "mirrors" of the Public Archive placed at a remote data centers such as IPAC. • SOFIA Facility Instruments will support General Investigators (GIs) using the concept of Astronomical Observation Templates (AOTs) and Astronomical Observation Requests (AORs). The Public Archive will support a well defined set of FI AOT’s, each of which will be reduced by software module pipelines delivered to the SOFIA Data Cycle System (DCS). • March 7 -8, 2000 DCS Preliminary Design Review 10

SOFIA Assumptions (2) • PI Instrument data will be supported at the Working Archive

SOFIA Assumptions (2) • PI Instrument data will be supported at the Working Archive level. • The Archive will consist of science, calibration, and laboratory test data from the Facility Instruments, plus SSMOC Housekeeping data. • SOFIA archive data are for public use after a reasonable validation period for proper reduction, calibration, and science validation by observing teams with support from the SSMOC. • The requirements are aimed at the SSMOC, the Facility Instrument teams, and the DCS software developers for the archive system. • Observations will be tracked through their complete lifecycle from the AOR through the raw, reduced, and final calibrated science data products using a unique Observation Identification number (OBSID). March 7 -8, 2000 DCS Preliminary Design Review 11

SOFIA Archive Interactions with DCS Components. March 7 -8, 2000 DCS Preliminary Design Review

SOFIA Archive Interactions with DCS Components. March 7 -8, 2000 DCS Preliminary Design Review 12

SOFIA High Level Archive Requirements • The archive shall simplify use and reuse of

SOFIA High Level Archive Requirements • The archive shall simplify use and reuse of SOFIA data during reduction, analysis, interpretation and publication. • The archive shall enable the DCS to store and retrieve uniform data products. • The archive will adhere to existing (FITS) and emerging (XML) standards for data storage and interchange between software modules. • The archive shall support continuous improvement of data reduction pipelines and improvements in calibration procedures. • The archive shall support online data access for humans (Web interfaces) and remote software clients (e. g. , via XML-based "server mode") from other astronomical data centers. • The archive shall provide services for archival research, including search tools and quantitative measurement tools. March 7 -8, 2000 DCS Preliminary Design Review 13

SOFIA Functional Requirements (1) The Archive software shall support efficient and reliable data insertion

SOFIA Functional Requirements (1) The Archive software shall support efficient and reliable data insertion functions and procedures. • • • Functions shall be provided to insert raw data from all instruments into the Working Archive and update a registry (index) of data files. This shall be done routinely after each flight. Insertion of intermediate and reduced data files resulting from pipeline processing will be handled by pipeline modules, & these modules shall adhere to the file and directory naming conventions outlined in the Directory and File Naming Conventions. Functions shall be provided to insert into the proper Archive level (Working, Summary, Public) FITS tables, catalogs, or text files, including but not limited to: – – Calibration sources Source lists (targets) for observing programs Observing Logs Flight Plans March 7 -8, 2000 DCS Preliminary Design Review 14

SOFIA Functional Requirements (2) Data for Facility and PI instruments shall be stored and

SOFIA Functional Requirements (2) Data for Facility and PI instruments shall be stored and maintained in the Working, Summary and Public Archive levels as follows: March 7 -8, 2000 DCS Preliminary Design Review 15

SOFIA Functional Requirements (3) Data in the Working, Summary and Public Archive levels shall

SOFIA Functional Requirements (3) Data in the Working, Summary and Public Archive levels shall be publicly available online through a Web interface for the different instrument types as follows: March 7 -8, 2000 DCS Preliminary Design Review 16

SOFIA Functional Requirements (4) • Functions shall be provided to verify the integrity and

SOFIA Functional Requirements (4) • Functions shall be provided to verify the integrity and validity of the data products. • Functions shall be provided to copy and track (version control) validated data products from the Working Archive into the Public Archive. • The Archive software shall provide functions to extract metadata to populate the Summary Archive. – – – The software shall extract header records from FITS data in the Working Archive and insert metadata into DBMS tables to support queries of the Summary Archive. The software shall convert (or "wrap") FITS to provide an API to the emerging Astronomical XML (AML) format for data and summary (metadata) interchange with other data and information systems, for example IRSA, STSc. I, HEASARC, NED, and others. The software shall automatically create links between data products, calibration files, and documentation as described in the Summary Archive Contents Requirements • Functions shall be provided to cross-reference flight video and audio recordings with the Working Archive and other relevant FITS data products. March 7 -8, 2000 DCS Preliminary Design Review 17

SOFIA Functional Requirements (5) Queries: • • • The Archive software shall support queries

SOFIA Functional Requirements (5) Queries: • • • The Archive software shall support queries for data sets meeting selection criteria meaningful to astronomers. Queries shall allow location of raw data products in the Working Archive. Queries shall allow location of reduced and calibrated data in the Public Archive. After searching based on query constraints as described above, the user shall have the ability to select one or more returned data set "handles", which are based on well-documented Observation ID (OBSID) numbers, to download the data immediately to his or her local computer via HTTP or FTP. A Web query form shall be provided which allows users to input a known Observation ID (OBSID) number to directly return the data products and optionally a subset of its associated Housekeeping Data and Documentation. March 7 -8, 2000 DCS Preliminary Design Review 18

SOFIA Functional Requirements (6) • The Archive software shall support queries involving astronomical positions

SOFIA Functional Requirements (6) • The Archive software shall support queries involving astronomical positions in standard coordinate systems. • The Archive software shall recognize queries on astronomical sky regions using cone searches and ranges expressed in standard coordinate systems. • The Archive software shall support queries on: – astronomical object names. – SOFIA instrument names. – AOT names and AOT parameters such as instrumental passbands, filter names, etc. – Wavelength ranges using standard astronomical conventions. – Time intervals – SOFIA Observation Identifiers (OBSIDs) – Observer names (PIs, Co-Is) March 7 -8, 2000 DCS Preliminary Design Review 19

SOFIA Functional Requirements (7) Document Tracking: • The Archive shall support tracking of data

SOFIA Functional Requirements (7) Document Tracking: • The Archive shall support tracking of data products. • The Archive shall support tracking of data reduction software modules and pipeline sequences. • The Archive shall support registration and tracking of documentation. March 7 -8, 2000 DCS Preliminary Design Review 20

SOFIA Functional Requirements (8) The DCS User Interface shall support modes of interaction with

SOFIA Functional Requirements (8) The DCS User Interface shall support modes of interaction with human users and software components: • Command-line user interfaces to each component. • Standard Uniform Resource Locators (URLs) accessible through Webbased forms and remote client software • A "server-mode" for use by client software within the DCS and from remote sites. • Graphical user interface (GUI) "widgets" for access to the archive integrated into the SOFIA Observation Planning and Flight Planning tools. • Results from archive queries shall be returned in well defined and clearly documented data structures. Ideally these data structures will be in a self-documenting, object-oriented format using XML. March 7 -8, 2000 DCS Preliminary Design Review 21

SOFIA Data Content Requirements Summary Archive The Summary Archive shall: • store observation FITS

SOFIA Data Content Requirements Summary Archive The Summary Archive shall: • store observation FITS header keywords and values extracted from the data products in a format that efficiently supports user queries. • contain Project Status information. • contain links to abstracts of Observing Proposals. • contain PI Observing Run Abstracts & Detailed Observing Logs. • contain links to the executed Flight Plans. • contain Flight Director Logs. • contain links to the Working and Public Archives, Pipeline Software Archive, Documentation Library, and Bibliography. March 7 -8, 2000 DCS Preliminary Design Review 22

SOFIA Data Content Requirements Working Archive (1) The Working Archive shall: • store raw

SOFIA Data Content Requirements Working Archive (1) The Working Archive shall: • store raw data (science & calibration) acquired from all SOFIA instruments. The raw data and related Housekeeping data shall be deposited into the Working Archive immediately after a successful SOFIA flight, ideally within a few hours after landing. • serve as the primary data repository. Data reduction pipelines will read raw data and write intermediate data produced by the Standard Data Product pipelines into the Working Archive. • serve as a data backup for General Investigators. • be housed at the SSMOC and made available to eligible PIs and Co. Is as soon as it enters the archive, and to the public after the requisite validation period. The Working Archive will be available online, but Working datasets will be transferred onto a Web-accessible (FTP) area with password protection. March 7 -8, 2000 DCS Preliminary Design Review 23

SOFIA Data Content Requirements Working Archive (2) The Working Archive shall: • track the

SOFIA Data Content Requirements Working Archive (2) The Working Archive shall: • track the processing history of science data products and instrument calibration files, notably for intermediate and reduced data products which are preliminary or unvalidated, and thus not yet copied to the Public Archive. • contain Housekeeping data pertaining to the state or status of the instruments, the aircraft, the telescope, and observing conditions (environment) while observations were made and data were collected. • contain FITS data files of Housekeeping & instrument calibration data stored either as header keywords and values, or pointers to more extensive data in auxiliary files which are required for data reduction and calibration by the pipelines. • serve as a resource for archival research, especially for people who wish to develop improvements to the data reduction algorithms to push the limits of the observations to make new scientific discoveries or improvements to previous interpretations. March 7 -8, 2000 DCS Preliminary Design Review 24

SOFIA Data Content Requirements Working Archive (3) Data in the Working Archive shall be

SOFIA Data Content Requirements Working Archive (3) Data in the Working Archive shall be linked to other Archive components : • • Summary Archive Metadata Actual Flight Plans Project Status Flight Logs appropriate versions of Pipeline Data Reduction Software Archive and supporting documentation. Reduced data in the Public Archive Documentation Library Video and audio recordings March 7 -8, 2000 DCS Preliminary Design Review 25

SOFIA Data Content Requirements Public Archive (1) • The SOFIA Facility Instruments will each

SOFIA Data Content Requirements Public Archive (1) • The SOFIA Facility Instruments will each have a Standard Pipeline that will produce reduced, calibrated images, photometric measurements, or spectra for standard modes, or AOTs. Data products resulting from filled-in AOTs, which are called Astronomical Observation Requests (AORs) comprise the Public Archive. • The Public Archive shall be accessible by GIs and the general public through Web-based query and request forms. • The Public Archive shall serve network-based requests for data from remote archive system software • The Public Archive data shall be mirrored at the Infrared Science Archive (IRSA) at IPAC, where interfaces and query engines will be developed and maintained in coordination with similar software used to support community access for data from NASA's other infrared missions. March 7 -8, 2000 DCS Preliminary Design Review 26

SOFIA Data Content Requirements Public Archive (2) Data in the Public Archive shall be

SOFIA Data Content Requirements Public Archive (2) Data in the Public Archive shall be linked to other Archive components: • • • Summary Archive Metadata Actual Flight Plans Project Status Flight Logs Pipeline Data Reduction Software Archive the user interfaces for access to the raw data and housekeeping data in the Working Archive, and the Documentation Library March 7 -8, 2000 DCS Preliminary Design Review 27

SOFIA Data Format and Transport Standards • The SOFIA instruments shall produce files in

SOFIA Data Format and Transport Standards • The SOFIA instruments shall produce files in FITS format as their primary raw data products. These will be transferred to the Archive team at the SSMOC and comprise the bulk of the Working Archive. • The DCS shall support archiving of FITS images and spectra using the Binary Table (BINTABLE) Extension Standard • SOFIA data shall follow a standard Dictionary for FITS Keyword Types. • Both FITS and XML formats will be supported for data interchange. • The "Observation Sequence Numbers" (OSNs) in a flight will be crossreferenced to the OBSID (Observation Identification) numbers in each PI's observing program using XML documents and/or database tables maintained in the Archive. March 7 -8, 2000 DCS Preliminary Design Review 28

SOFIA Pipeline Software Archive • Pipeline software: a well-defined, documented, automated, scientifically validated, ordered

SOFIA Pipeline Software Archive • Pipeline software: a well-defined, documented, automated, scientifically validated, ordered sequence of data reduction module operations designed for a specific set of AOT's supported by the SOFIA DCS. – • The data reduction modules shall be delivered to the DCS by the Facility Instrument Teams, along with the validated pipelines that support the chosen AOTs. The general pipeline architecture, maintenance and version control will subsequently be SSMOC and DCS responsibilities, initially in close collaboration with the instrument teams An official pipeline version is associated with an approved scientific validation procedure defined by the SOFIA Science Center. Since data reduction software will evolve during the lifecycle of SOFIA, and data storage or transfer formats may change slightly as knowledge of calibration and reductions improves, all modules related to data reduction and calibration shall be archived and downloadable from the SOFIA DCS Web site. March 7 -8, 2000 DCS Preliminary Design Review 29

SOFIA Pipeline Software Archive • Reduced intermediate and calibrated data products which are the

SOFIA Pipeline Software Archive • Reduced intermediate and calibrated data products which are the result of pipeline data reduction software shall contain FITS keywords that record the pipeline version that produced them. • The Web interface for the Software Archive shall indicate which versions of the pipeline software produced each AOR on a given date. There shall also be links to documentation of each data reduction software module. • NOTE: Flight Planning software and Proposal Preparation software not included in the Software Archive because they are not directly related to the science data archive itself. March 7 -8, 2000 DCS Preliminary Design Review 30

SOFIA Documentation Library NOTE: There is currently no centralized Documentation Library that satisfies the

SOFIA Documentation Library NOTE: There is currently no centralized Documentation Library that satisfies the needs of all aspects of the DCS and the SOFIA project. Although there is a clear need for the Archive to have strong ties to the Documentation Library and SOFIA Bibliography, it will not formally be considered part of the SOFIA archive, which concentrates on the science data. These requirements which are related to the Archive are included here for completeness, and they should be considered in the design of the SOFIA Documentation Library. The Documentation Library shall: • • contain Users Manuals for the Facility Instruments, with version control. contain data reduction and pipeline software descriptions and manuals, with version control. contain the Observer's Guide to Aircraft Procedures, current version. contain the Flight Planning Software Manual, current version. contain the Calls for Proposals, both for observing and instrument development, with version control. maintain a SOFIA Bibliography to support the project in tracking the productivity of each observing program. Its contents shall be cross-referenced to the Project Status information for each proposal. be located at the SOFIA Science Center and closely linked to the SOFIA Web site and the data archives. March 7 -8, 2000 DCS Preliminary Design Review 31

Implementation SOFIA • Software Work Products: – Data Inventory Generator • Facility Science Data

Implementation SOFIA • Software Work Products: – Data Inventory Generator • Facility Science Data Capture Tool • Housekeeping Data Capture Tool • Ancillary Data Capture Tool – Summary Generator • Header Consolidation • Link Generator • Validation of Required Files Present • Populates the Summary Archive March 7 -8, 2000 DCS Preliminary Design Review 32

Implementation SOFIA • Software Work Products (continued): – Archive Management Tool • Archive Integrity

Implementation SOFIA • Software Work Products (continued): – Archive Management Tool • Archive Integrity Checking Tool • Backup Tools • DBMS mangement GUI interface • Expert (Internal) Pipeline Interface • Pipeline Evocation Module – Query Tools • Web based Query interface • Interface to Commercial DBMS system • Report Generation Modules • Query Logging March 7 -8, 2000 DCS Preliminary Design Review 33

Implementation SOFIA • Protocol Documents – Format Documents • Facility Instruments Science Data Format

Implementation SOFIA • Protocol Documents – Format Documents • Facility Instruments Science Data Format Document • Flight Log Format Document • Observer Log Format Document • Housekeeping Data Format Document • Archive Directory Structure Document – Design Documents • Conceptual Archive Design Document • Archive Implementation Design Document March 7 -8, 2000 DCS Preliminary Design Review 34

Implementation SOFIA • Archive Test Results – Archive Testing Plan – Archive Testing Reports

Implementation SOFIA • Archive Test Results – Archive Testing Plan – Archive Testing Reports – Archive Performance Verification Reports March 7 -8, 2000 DCS Preliminary Design Review 35