LSST Data Management Data Products and Software Stack

  • Slides: 34
Download presentation
LSST Data Management: Data Products and Software Stack Overview Mario Juric LSST Data Management

LSST Data Management: Data Products and Software Stack Overview Mario Juric LSST Data Management Project Scientist Robyn Allsman, Yusra Al. Sayyad, Tim Axelrod, Jacek Becla, Andrew Becker, Steve Bickerton, Jim Bosch, Bill Chickering, Andy Connolly, Greg Daues Gregory Dubois. Fellsman, Mike Freemon, Andy Hanushevsky, Fabrice Jammes, Lynne Jones, Jeff Kantor, Kian-Tat Lim, Dustin Lang, Ron Lambert, Robert Lupton (the Good), Simon Krughoff, Serge Monkewitz, Jon Myers, Russell Owen, Steve Pietrowicz, Ray Plante, Paul Price, Andrei Salnikov, Dick Shaw, Schuyler Van Dyk, Daniel Wang and the LSST Project Team JOINT DES-LSST March 25, 2014 JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. WORKSHOP Name of Meeting • Location • Date - Change in Slide Master 1

A Dedicated Survey Telescope − A wide (half the sky), deep (24. 5/27. 5

A Dedicated Survey Telescope − A wide (half the sky), deep (24. 5/27. 5 mag), fast (image the sky once every 3 days) survey telescope. Beginning in 2022, it will repeatedly image the sky for 10 years. − The LSST is an integrated survey system. The Observatory, Telescope, Camera and Data Management system are all built to support the LSST survey. There’s no PI mode, proposals, or time. − The ultimate deliverable of LSST is not the telescope, nor the instruments; it is the fully reduced data. • All science will be come from survey catalogs and images Telescope Images Catalogs JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 2

Open Data, Open Source: A Community Resource − LSST data, including images and catalogs,

Open Data, Open Source: A Community Resource − LSST data, including images and catalogs, will be available with no proprietary period to the astronomical community of the United States, Chile, and International Partners − Alerts to variable sources (“transient alerts”) will be available world-wide within 60 seconds, using standard protocols − LSST data processing stack will be free software (licensed under the GPL, v 3 -or-later) − All science will be done by the community (not the Project!), using LSST’s data products JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 3

LSST From the Astronomer’s Perspective Level 1 − A stream of ~10 million time-domain

LSST From the Astronomer’s Perspective Level 1 − A stream of ~10 million time-domain events per night, detected and transmitted to event distribution networks within 60 seconds of observation. − A catalog of orbits for ~6 million bodies in the Solar System. Level 2 − A catalog of ~37 billion objects (20 B galaxies, 17 B stars), ~7 trillion observations (“sources”), and ~30 trillion measurements (“forced sources”), produced annually, accessible through online databases. − Deep co-added images. JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. Level 3 − Services and computing resources at the Data Access Centers to enable user-specified custom processing and analysis. − Software and APIs enabling development of analysis codes. 4

Level 1: Transients Alerts − LSST computing is sized for 10 M alerts/night (average),

Level 1: Transients Alerts − LSST computing is sized for 10 M alerts/night (average), 10 k/visit (average), 40 k/visit (peak) • Dedicated networking for moving data from Chile to the US • New image differencing pipelines with improved algorithms − Will measure and transmit with each alert: • position • flux, size, and shape • light curves in all bands (up to a ~year; stretch: all) • variability characterization (eg. , low-order light-curve moments, probability the object is variable) • cut-outs centered on the object (template, difference image) JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 5

Level 2: Annual Data Releases − Well calibrated, consistently processed, catalogs and images •

Level 2: Annual Data Releases − Well calibrated, consistently processed, catalogs and images • Catalogs of objects, detections in difference images, etc. − Made available in Data Releases • Annually, except for Year 1 - Two DRs for the first year of data − Complete reprocessing of all data, for each release • Every DR will reprocess all data taken up to the beginning of that DR − Projected catalog sizes: • 18 billion objects (DR 1) • 750 billion observations (DR 1) 37 billion (DR 11) 30 trillion (DR 11) JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 6

Level 2: Key Data Products − Processed visits (“calibrated exposures”) Images • Visit images

Level 2: Key Data Products − Processed visits (“calibrated exposures”) Images • Visit images with instrumental signature removed, background, PSF, zero-point and WCS determined − Coadds • Deep coadds across the entire survey footprint − Catalogs of Sources Catalogs • Measurements of sources detected on calibrated exposures − Catalogs of Objects • Characterization of objects detected on multi-epoch data (posterior samples) − Catalogs of Forced. Sources • Forced photometry performed on all exposures, at locations of all Objects JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 7

Level 3: Enabling User-created Data Products − Products created by the community using LSST’s

Level 3: Enabling User-created Data Products − Products created by the community using LSST’s software, services, or computing resources. − For use-cases not fully enabled by Level 1 and 2 processing: • Reprocessing images to search for SNe light echos • Characterization of diffuse structures (e. g. , ISM) • Extremely crowded field photometry (e. g. , globular clusters) • Custom measurement algorithms − Enabling Level 3: • User databases and workspaces (“mydb”) • Enabling user computing at the LSST data center - processing that will greatly benefit from co-location with the LSST data • Making the LSST software stack available to end-users JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 8

LSST Data Management System (from readout to delivery to the user) JOINT DES-LSST WORKSHOP

LSST Data Management System (from readout to delivery to the user) JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 9

LSST Data Management: Roles − Archive Raw Data: Receive the incoming stream of images

LSST Data Management: Roles − Archive Raw Data: Receive the incoming stream of images that the Camera system generates to archive the raw images. − Process to Data Products: Detect and alert on transient events within one minute of visit acquisition. Approximately once per year create and archive a Data Release, a static self-consistent collection of data products generated from all survey data taken from the date of survey initiation to the cutoff date for the Data Release. − Publish: Make all LSST data available through an interface that uses community-accepted standards, and facilitate user data analysis and production of user-defined data products at Data Access Centers (DACs) and external sites. JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 10

LSST Operations: Sites and Data Flows Archive Site Archive Center Alert Production Data Release

LSST Operations: Sites and Data Flows Archive Site Archive Center Alert Production Data Release Production Calibration Products Production EPO Infrastructure Long-term Storage (copy 2) Data Access Center Data Access and User Services Dedicated Long Haul Networks Two redundant 40 Gbit links from La Serena to Champaign, IL (existing fiber) HQ Site Summit and Base Sites Telescope and Camera Data Acquisition Crosstalk Correction Long-term storage (copy 1) Chilean Data Access Center Science Operations Observatory Management Education and Public Outreach JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 11

Infrastructure: Petascale Computing, Gbit Networks The computing cluster at the LSST Archive (at NCSA)

Infrastructure: Petascale Computing, Gbit Networks The computing cluster at the LSST Archive (at NCSA) will run the processing pipelines. • Single-user, single-application, dedicated data center • Process images in real-time to detect changes in the sky • Produce annual data releases Archive Site and U. S. Data Access Center NCSA, Champaign, IL Long Haul Networks to transport data from Chile to the U. S. • 200 Gbps from Summit to La Serena (new fiber) • 2 x 40 Gbit (minimum) for La Serena to Champaign, IL (protected, existing fiber) Base Site and Chilean Data Access Center La Serena, Chile JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 12

“Applications”: Scientific Core of LSST DM − Applications carry core scientific algorithms that process

“Applications”: Scientific Core of LSST DM − Applications carry core scientific algorithms that process or analyze raw LSST data to generate output Data Products − Variety of processing • • • Image processing Measurement of source properties Associating sources across space and time, e. g. for tracking solar system objects − Applications framework layer (afw; not shown) allows them to be written in a high-level language JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 13

Middleware Layer: Isolating Hardware, Orchestrating Software Enabling execution of science pipelines on hundreds of

Middleware Layer: Isolating Hardware, Orchestrating Software Enabling execution of science pipelines on hundreds of thousands of cores. • Frameworks to construct pipelines out of basic algorithmic components • Orchestration of execution on thousands of cores • Control and monitoring of the whole DM System Isolating the science pipelines from details of underlying hardware • Services used by applications to access/produce data and communicate • "Common denominator" interfaces handle changing underlying technologies JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 14

Database and Science UI: Delivering to Users Massively parallel, distributed, fault-tolerant relational database. •

Database and Science UI: Delivering to Users Massively parallel, distributed, fault-tolerant relational database. • To be built on existing, robust, wellunderstood, technologies (My. SQL and xrootd) • Commodity hardware, open source • Advanced prototype in existence (qserv) Science User Interface to enable the access to and analysis of LSST data • Web and machine interfaces to LSST databases • Visualization and analysis capabilities JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 15

UI Database Core Algorithms (“Apps”) Middleware Mgmt, I&T, and Science QA Going Where the

UI Database Core Algorithms (“Apps”) Middleware Mgmt, I&T, and Science QA Going Where the Talent is: Distributed Team Infrastructure JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 16

The LSST Software Stack (science pipelines, middleware, database, user interfaces) “Enabling LSST science by

The LSST Software Stack (science pipelines, middleware, database, user interfaces) “Enabling LSST science by creating a well documented, stateof-the-art, high-performance, scalable, multi-camera, open source, O/IR survey data processing and analysis system. ” JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 17

L 3 Level 2 Level 1 LSST Science Pipelines − − − 02 C.

L 3 Level 2 Level 1 LSST Science Pipelines − − − 02 C. 01. 02. 01/02. Data Quality Assessment Pipelines (slides by Juric) 02 C. 01. [02. 01. 04, 04. 01, 04. 02] Calibration Pipelines (slides by Axelrod, Yoachim) 02 C. 03. 01. Single-Frame Processing Pipeline (slides by Krughoff, Lupton) 02 C. 03. 02. Association pipeline (slides by Lupton) 02 C. 03. Alert Generation Pipeline (slides by Becker) 02 C. 03. 04. Image Differencing Pipeline (slides by Becker) 02 C. 03. 06. Moving Object Pipeline (slides by Jones) 02 C. 04. 03. PSF Estimation Pipeline (slides by Lupton) 02 C. 04. Image Coaddition Pipeline (slides by Al. Sayyad) 02 C. 04. 05. Deep Detection Pipeline (slides by Lupton) 02 C. 04. 06. Object Characterization Pipeline (slides by Lupton, Bosch) 02 C. 01. 02. 03. Science Pipeline Toolkit (slides by Dubois-Felsmann) − 02 C. 03. 05/04. 07 Application Framework (slides by Lupton) Data Management Applications Design (LDM-151) JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. Pipelines reviewed in Sep. ’ 13, Magnier et al. committee Calibration reviewed in July ’ 13, Wood-Vasey et al. committee 18

Implementation Strategy: Transfer Know-how, not Code − Difficulty adapting existing public codes to LSST

Implementation Strategy: Transfer Know-how, not Code − Difficulty adapting existing public codes to LSST requirements (Astro. Matic suite, PHOTO, Elixir, IRAF-based pipelines, etc. ) • Need to run efficiently at scale • Need to be flexible (plugging/unplugging of algorithms at runtime) • Need to have it developed by a large team (20+ scientists and programmers) • Need to be maintainable over ~25 years of R&D, Construction, and Survey Operations • Need to run on a variety of hardware and software platforms • Need to have logging and provenance built into the design − Early on (~2006), a decision was made to transfer scientific know-how, but not code. JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 19

Maintainable Design / Language Choices − LSST software stack is largely written from scratch,

Maintainable Design / Language Choices − LSST software stack is largely written from scratch, in Python, unless computational demands require the use of C++ • C++: - Computationally intensive code - Made available to Python via SWIG • Python: - All high-level code - Prefer Python to C++ unless performance demands otherwise − Modularity • Virtually everything is a Python module. • ~60 packages (git repositories, ~corresponding to python packages) − Build system: scons Version control: git Package management: EUPS JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 20

Modular Architecture Command-line driver scripts Cluster execution middleware Tasks (ISR, Detection, Co-adding, …) Measurement

Modular Architecture Command-line driver scripts Cluster execution middleware Tasks (ISR, Detection, Co-adding, …) Measurement Algorithms (meas_*) Camera Abstraction Layer (obs_* packages) … Application Framework (comp. intensive C++, SWIG-wrapped into Python) Middleware (I/O, configuration, …) External C/C++ Libraries (Boost, FFTW, Eigen, CUDA. . ) External Python Modules (numpy, pyfits, matplotlib, …) Red: Mostly C++ (but Python wrapped); Blue: Mostly Python; Black: External Libraries JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 21

Current Status: Advanced Prototypes (~SDSS level quality) − 7 -year prototyping effort • •

Current Status: Advanced Prototypes (~SDSS level quality) − 7 -year prototyping effort • • 8 software releases (Data Challenges) Status: A rapidly maturing state-of-the art astronomical data reduction system - ~SDSS level quality of reductions - Most recently tested by building co-adds using SDSS Stripe 82 data - Used in commissioning of the Hyper Suprime-Cam Survey on Subaru − Prototyped Features: • • • Instrumental signature removal #1 Outstanding Issue: Single-frame processing Insufficient documentation. Point source photometry Extended source photometry (model fitting) Planning to begin addressing it over Deblender the next few months. Co-addition of images Image differencing Object characterization on multi-epoch data (Stack. Fit/Multi. Fit) … JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 22

New Algorithms: Background-matched coadd of SDSS Stripe 82 in the vicinity of M 2.

New Algorithms: Background-matched coadd of SDSS Stripe 82 in the vicinity of M 2. Background matching preserves diffuse structures. Generated with LSST pipeline prototypes. Figure: 5 sq. deg. background-matched coadd composite (g, r, i) ~55 epochs Region: Aqr Galactic lat = -35. 0 Slide: Yusra Al. Sayyad http: //moe. astro. washington. edu/sdss/

Streams in LSST-reprocessed SDSS Stripe 82 http: //moe. astro. washington. edu/sdss/ Stripe 82 background-matched

Streams in LSST-reprocessed SDSS Stripe 82 http: //moe. astro. washington. edu/sdss/ Stripe 82 background-matched coadds built with LSST Data Management stack (http: //moe. astro. washington. edu) JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 24

Example: Forced Photometry on SDSS Stripe 82 Forced Photometry For every detection in the

Example: Forced Photometry on SDSS Stripe 82 Forced Photometry For every detection in the deep co-add, perform PSF photometry on individual frames (ugriz). Note that the majority of these will be below the single-frame SNR detection treshold. Averaging those fluxes allows one to go deeper. Left: comparison of Ivezic et al. (2004) w and y color loci; single frame vs. deep catalog. JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 25

Winter 2014 Software Release Installing curl –O http: //sw. lsstcorp. org/eupspkg/newinstall. sh bash newinstall.

Winter 2014 Software Release Installing curl –O http: //sw. lsstcorp. org/eupspkg/newinstall. sh bash newinstall. sh • Supported platforms (platforms we regularly build on; generally builds on any Linux/BSD) • RHEL 6 • OS X 10. 8 Mountain Lion • OS X 10. 9 Mavericks Early Draft of a Users’ Guide: http: //ls. st/ug A new release every 6 months (Feb/Mar, Aug/Sep) JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 26

WARNING! ADVERTENCIA! AVERTISSEMENT! THIS IS STILL NOT A FINISHED, POLISHED, READY-TO-USE ENDUSER PRODUCT! BEFORE

WARNING! ADVERTENCIA! AVERTISSEMENT! THIS IS STILL NOT A FINISHED, POLISHED, READY-TO-USE ENDUSER PRODUCT! BEFORE DOWNLOADING, PLEASE MAKE SURE TO READ THE DM STACK FAQ: JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 27

WARNING! ADVERTENCIA! AVERTISSEMENT! THIS IS STILL NOT A FINISHED, POLISHED, READY-TO-USE ENDUSER PRODUCT! BEFORE

WARNING! ADVERTENCIA! AVERTISSEMENT! THIS IS STILL NOT A FINISHED, POLISHED, READY-TO-USE ENDUSER PRODUCT! BEFORE DOWNLOADING, PLEASE MAKE SURE TO READ THE DM STACK FAQ: http: //dev. lsstcorp. org/trac/wiki/DM/Policy/Using. DMCode/FA Q KEY POINTS: - POOR DOCUMENTATION - YOU’RE DOWNLOADING UNSUPPORTED, PROTOTYPE, CODE - THIS CODE WILL NOT WORK OUT OF THE BOX FOR CAMERAS OTHER THAN LSST (AND SDSS). - EXPECT TO WRITE SOME PYTHON CODE JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 28

Exciting Opportunities Beyond LSST! − Disclaimer: LSST DM’s primary mission is to build the

Exciting Opportunities Beyond LSST! − Disclaimer: LSST DM’s primary mission is to build the data processing system for LSST. We (the LSST Project) are 100% focused on that. − However, the optimal design for LSST (that balances technical and programmatic risks) makes this system highly reusable and general purpose. • Necessary to deal with real-world hardware • Necessary to be able to process precursor data • Necessary to enable science (“Level 3”) software to be written on top of it − There are tremendous opportunities for using the LSST stack components on existing and future data sets. • • More work ahead, but becoming a state of the art, well supported, codebase Possibilities: SDSS, CFHT-LS, Pan. STARRS, HSC, DES, WFIRST, Euclid, … Good basis for analysis frameworks (LSST DESC) Leveraging a 100 M+ NSF investment in large survey data management − The benefits feed back to LSST: more users, less bugs, better understanding, shorter path to science. JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 29

Fostering a Community: LSST @ Git. Hub IMPORTANT: If you’re using LSST code, let

Fostering a Community: LSST @ Git. Hub IMPORTANT: If you’re using LSST code, let us know and report bugs! JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 30

LSST Talk Roadmap Today − LSST DM Overview (this talk) − LSST Software Stack

LSST Talk Roadmap Today − LSST DM Overview (this talk) − LSST Software Stack Architecture (K-T Lim, DM Architect) − PSF Estimation (J. Bosch, LSST/Princeton) − Building on top of LSST: The HSC Pipeline (P. Price, HSC/Princeton) “To improve LSST data management by learning from the DES experience, and to illustrate opportunities for using the LSST software stack beyond LSST. ” JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 31

Finding Out More About LSST Software − LSST Data Management Home: @LSST @mjuric •

Finding Out More About LSST Software − LSST Data Management Home: @LSST @mjuric • https: //confluence. lsstcorp. org/x/QYEf − Using the LSST Stack to process Im. Sim Images: • http: //dev. lsstcorp. org/trac/wiki/installing/Sims − Using LSST Stack to process SDSS images: • http: //dev. lsstcorp. org/trac/wiki/Installing/Winter 2013 − Tutorials on developing with LSST Stack components: • https: //confluence. slac. stanford. edu/display/LSSTCAM/DM+Stack+Wo rking+Meeting − Mailing list: • http: //listserv. lsstcorp. org/mailman/listinfo/lsst-dm-stack-users JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 32

http: //lsst. org LSST: Thank You for Your Attention! 8. 4 m telescope 18000+

http: //lsst. org LSST: Thank You for Your Attention! 8. 4 m telescope 18000+ deg 2 ugrizy 3. 2 Gpix camera 10 mas astrom. r<24. 5 (<27. 5@10 yr) 0. 5 -1% photometry 30 sec exp/4 sec rd 15 TB/night 37 B objects Imaging the visible sky, once every 3 days, for 10 years (825 revisits) Joint DES-LSST Workshop • Fermilab • March 25, 2014. 33

JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 34

JOINT DES-LSST WORKSHOP | FERMILAB | MARCH 25, 2014. 34