Scientific Data Infrastructure in CAS Dr Jianhui Lilijhcnic

  • Slides: 27
Download presentation
Scientific Data Infrastructure in CAS Dr. Jianhui Li(lijh@cnic. cn) Scientific Data Center Computer Network

Scientific Data Infrastructure in CAS Dr. Jianhui Li(lijh@cnic. cn) Scientific Data Center Computer Network Information Center Chinese Academy of Sciences

Scientific Data infrastructure Application enabled environments and typical applications Middle ware Software and Toolkits

Scientific Data infrastructure Application enabled environments and typical applications Middle ware Software and Toolkits (Scientific data grid middleware, internet-based storage service middleware…) (scientific data collection, curation, and publishing, data analyzing and visualization…) Scientific databases Massive storage system Data-intensive computing facilities High speed network

ss pro ce nd na lys is a ma da ss ta ta a

ss pro ce nd na lys is a ma da ss ta ta a ff ss Ma sta e da olo chn ac p es Te rag ice erv Mass data backup sto es lin on system environment nt Data Resource Center rk two me ge na Ma tem s sy collaborator Ne Long-term preservation of important data n tio ca pli Ap vice r se ta Da • A new organization responsible for data preservation, curation and access service in CAS gy ser vic e DRC: Data Resource Center

Infrastructure for DRC • High Speed Network – 2 Gbps linked with CSTNET –

Infrastructure for DRC • High Speed Network – 2 Gbps linked with CSTNET – 2 Gbps linked with CSTNET-CNGI – GLORIAD • Data Intensive Computing facilities – ~1000 CPU Core Clusters + Scientific Computing Grid(~200 Tflops) • Massive Storage System – 1 PB online disk + 5 PB Tape – A storage network will start to build this year • 1 center + 1 archive center + 10 storage nodes around China • Over 20 PB

Scientific Databases (SDB) • A Long-term mission started in 1986 which funded by CAS

Scientific Databases (SDB) • A Long-term mission started in 1986 which funded by CAS – many institutes involved – long-term, large-scale collaboration – data from research, for research • Collecting multi-discipline research data and promoting data sharing – More than 350 research databases and 400 datasets by 61 institutes – Over 60 TB data available to open access and download http: //www. csdb. cn

Scientific Databases (cont. ) • SDB Contents – Physics & Chemistry, Geosciences, Biosciences, Atmospheric

Scientific Databases (cont. ) • SDB Contents – Physics & Chemistry, Geosciences, Biosciences, Atmospheric & Ocean Science, Energy Science, Material Science, Astronomy & Space Science

Scientific Databases (cont. ) • Database integration – Resource database – Reference database –

Scientific Databases (cont. ) • Database integration – Resource database – Reference database – Application oriented database Reference database Resource database Research database Application oriented database

Scientific Databases (cont. ) • 8 Resource databases • 2 Reference databases – –

Scientific Databases (cont. ) • 8 Resource databases • 2 Reference databases – – – – Geo-Science Biodiversity Chemistry Astronomy Space Science Micro biology and virus Material science Environment – China Species – compound • 4 application-Oriented databases – High Energy (ITER) – Western Environment Research – Ecology research – Qinghai Lake Research

CAS Scientific Data Grid • Based on Scientific Data Grid Middleware (SDG) – SDG

CAS Scientific Data Grid • Based on Scientific Data Grid Middleware (SDG) – SDG is built upon the Scientific Database, supporting to find access large scale, distributed and heterogeneous scientific data uniformly and conveniently in a SECURE and proper way • Building scientific data application grid according to domain requirements – Integrate distributed data, analysis tools and storage and computing facilities, providing a uniform data service interface – 4 pilot grids • • bioscience grid geoscience grid Chemistry grid Astronomy and space science grid

Function Framework of SDG • A scalable and integrated data sharing environment – Providing

Function Framework of SDG • A scalable and integrated data sharing environment – Providing services for grid users, grid managers and resource provides – Operating by the operation center, science gateways and data nodes User Grid Manager Resource Provider Operation Center Science Gateway Data Node

Access Scientific Data Grid Science Gateway and access portal Reference Databases External Data Source

Access Scientific Data Grid Science Gateway and access portal Reference Databases External Data Source App. Oriented Databases Resource Databases Research Database Grid Middleware Software Tool

Visual. DB - Powered your database • A toolkit to manage, publish and share

Visual. DB - Powered your database • A toolkit to manage, publish and share scientific database by visual configure interface without writing codes • A database integration access broker • A data quality assessment tool • A database access and usage statistics tool

Function Framework of Visual. DB Securit y. Center Data Forge Catalog Builder VDBSDK VDB

Function Framework of Visual. DB Securit y. Center Data Forge Catalog Builder VDBSDK VDB my. DB Web. API v. Report

Catalog Builder

Catalog Builder

Security Center

Security Center

Data Forge

Data Forge

v. Report

v. Report

Application enabled environments and typical applications • Domain specific data intensive application environment –

Application enabled environments and typical applications • Domain specific data intensive application environment – Support one specific research area – Integrated scientific data, storage, computing analysis model and tools – An easily and friendly interactive interface – Scalable user defined data process workflow • Typical pilot systems – Remote sensing data on-demand accessing and processing service environment – CFCI - China FLUX Cyber-Infrastructure – Darwin. Tree——Molecular data analysis and application environment – Atmospheric science data integration analysis platform

Atmospheric science data integration analysis platform • Status quo

Atmospheric science data integration analysis platform • Status quo

Atmospheric science data integration analysis platform • Problems – The size of Atmospheric data

Atmospheric science data integration analysis platform • Problems – The size of Atmospheric data has reached TB level and they are distributed. – The personal computer hard disk, memory limit of the research work – Many algorithm finished by scientific researcher can’t be shared easily.

Architecture Web browser 1)custom 2)visualize Using Iterative Resercher Define workflow Result Scientific Data Analysis

Architecture Web browser 1)custom 2)visualize Using Iterative Resercher Define workflow Result Scientific Data Analysis Online Platform Algorithm Chosen Data Finding Computing for Workflow Algorithm Model Combined with data and model Distributed data Result

work flow Five step Choose algorithm Select Data Iterative plot Analyse result Config param

work flow Five step Choose algorithm Select Data Iterative plot Analyse result Config param

Select data

Select data

Choose algorithm

Choose algorithm

Config param

Config param

plot and result

plot and result

Thank you!

Thank you!