Future of Astronomy enormous datasets massive computing innovative

  • Slides: 18
Download presentation
Future of Astronomy: enormous datasets, massive computing, innovative instrumentation Rachel Webster & David Barnes

Future of Astronomy: enormous datasets, massive computing, innovative instrumentation Rachel Webster & David Barnes (Project Leader & Project Scientist, Australian Virtual Observatory) School of Physics, The University of Melbourne

Topics 1. Astronomy is a theoretical and observational science with massive heterogeneous datasets. 2.

Topics 1. Astronomy is a theoretical and observational science with massive heterogeneous datasets. 2. The Virtual Observatory (VO) 3. Aus-VO: the Australian project 4. A new local opportunity: MWA low frequency array Read here for a summary of each slide…. .

2. What is a Virtual Observatory? • A Virtual Observatory (VO) is a distributed,

2. What is a Virtual Observatory? • A Virtual Observatory (VO) is a distributed, uniform interface to the data archives of the world’s major astronomical facilities. • A VO is realised with advanced data mining and visualisation tools which exploit the unified interface to enable cross-correlation and combined processing of distributed and diverse datasets. • VOs will rely on, and provide motivation for, the development of national and international computational and data grids. Virtual observatories will effect a “sea change” in the way astronomy is done.

International Data deluge! • • • Dozens of new surveys 2003 to 2008 Many

International Data deluge! • • • Dozens of new surveys 2003 to 2008 Many (10 – 100) terabytes per survey 10 – 100 researchers per survey International collaborations (almost always) Data is non-proprietary (usually) Surveys are no longer within the scope of the solo researcher, and also cannot be accommodated by isolated computing and storage facilities. Enter Grid Computing and the Virtual Observatory New surveys of the whole sky need a new paradigm: enter the Virtual Observatory.

3. Aus-VO and APAC Grid Project • 10 institutions; 4 large grants over 2

3. Aus-VO and APAC Grid Project • 10 institutions; 4 large grants over 2 -4 years (LIEF & APAC) • VO Data Warehousing (10 major datasets) • Gravity Wave Research Grid • VO Theory Portal • Registry, storage service, hpc, query languages, visualisation, data mining • Melbourne-led (at present) The Australian Astronomy Grid will be developed to handle data storage and access needs.

IVOA and the International Context • More than 15 active national VO programs; •

IVOA and the International Context • More than 15 active national VO programs; • Multi-million $$ investments in UK, USA and Europe • Loose but collegial collaboration • Responsible for international standards • Active meeting program (we have no funds to participate) The Australian Astronomy Grid will be developed to handle data storage and access needs.

Australian data storage and access • Australian astronomy data holdings presently exceed ~40 TB

Australian data storage and access • Australian astronomy data holdings presently exceed ~40 TB in size, and are growing rapidly. • Typical high-end workstations can store only ~100 GB or so. • Providing access to the data – raw and processed – requires a distributed, highbandwidth network of data servers. • The Australian Virtual Observatory project is developing the Australian Astronomy Grid to handle future demand. The Australian Astronomy Grid will be developed to handle data storage and access needs.

The Australian Astronomy Grid 2004

The Australian Astronomy Grid 2004

The HI Parkes All Sky Survey • Parkes 64 m radio telescope in NSW.

The HI Parkes All Sky Survey • Parkes 64 m radio telescope in NSW. • Hyperfine transition of atomic Hydrogen, =21 cm. • 280 days over 4 years; 40 observers; 1000 GB raw data. • 400 image “cubes” searched by computer for significant signals. The Parkes telescope has surveyed the entire southern sky for emission from Hydrogen.

The Two Micron All Sky Survey All-sky map of 1. 6 million 2 MASS

The Two Micron All Sky Survey All-sky map of 1. 6 million 2 MASS extended sources. • • • 4 M images 470 M point sources 1. 6 M extended sources ~500 parameters per source! 25 TB of data! Less than 10% of the catalogue fits in memory on a typical workstation Jarrett et al. , 2000 Another example is 2 MASS which has catalogued nearly half a billion objects in the sky.

Other major surveys. . . • Sloan Digital Sky Survey (SDSS) – position and

Other major surveys. . . • Sloan Digital Sky Survey (SDSS) – position and brightness of 100 M objects – distance to more than 100 K quasars – 15 Terabytes of data! • Radial Velocity Experiment (RAVE) – 50 M stars: velocities, metallicities, and abundance ratios – 10 TB of data! • Faint Images of the Radio Sky (FIRST) – 811, 000 sources with radio continuum flux densities at 20 cm wavelength Dozens of major, terabyte-scale survey projects are underway or planned.

Theoretical Astronomy • Theory provides models of the phenomena discovered by observations. • Theory

Theoretical Astronomy • Theory provides models of the phenomena discovered by observations. • Theory makes predictions of what will be seen by future facilities. • Many theories are nonanalytic, and sophisticated numerical simulations are run on supercomputers to produce realisations of synthetic universes. Simulations can produce realisations of synthetic universes from fundamental physics.

Linking theory to observations • Simulations are not expected to produce our particular Universe.

Linking theory to observations • Simulations are not expected to produce our particular Universe. • Instead, they generate systems which can be compared statistically to our Universe. • Realisations of a good model should be statistically indistinguishable from the observed Universe. • Useful statistical comparisons demand high quality data and large numbers of objects independent of how you bin the data. • Deeper, faster and more sophisticated surveys are called for. . . Bigger and better simulations demand super surveys for statistical comprehension.

4. MWA: Mileura Widefield Array • Low frequency radio domain (110 -240 MHz) largely

4. MWA: Mileura Widefield Array • Low frequency radio domain (110 -240 MHz) largely unexplored • Not easy: ionosphere, FM band, etc • BUT: aim to detect the first sources and map Epoch of Reionisation • One of 3 international experiments (strongest project ) New US/Australian low frequency array

 • • Remote WA, for first light in 2007 6 TB fibre link

• • Remote WA, for first light in 2007 6 TB fibre link to Geraldton Storage: 100’s TBs CPU: 50 Tflops • Melbourne, in collaboration with MIT, ATNF, Harvard and others • Industry partners New low frequency array will use innovative data-handling algorithms

MWA: Basic Approach ‘Desert Australia’ is probably the best site in the world for

MWA: Basic Approach ‘Desert Australia’ is probably the best site in the world for low frequency astronomy

MWA: Signal Processing • • 500 tiles (x 16 dipoles) 125, 000 baselines, 4

MWA: Signal Processing • • 500 tiles (x 16 dipoles) 125, 000 baselines, 4 polarization products FPGA based hardware Receiver: analog and mixed-signal front end; digital back end • Data stream: ~2 billion visibilities/0. 5 sec Technical requirements and directions

Melbourne Astrophysics Requirements • High Bandwidth Communications (Access Grid): scientific collaboration & conferences •

Melbourne Astrophysics Requirements • High Bandwidth Communications (Access Grid): scientific collaboration & conferences • Functional Grid: storage & processing • Institutional Commitment: planning, resourcing, r&d, • Institutional Leadership: NEW