Data Science essentials Introduction Part 1 Why Data

































- Slides: 33
Data Science essentials Introduction Part 1: Why Data Science essentials? 24 March 2019 DTU Wind Energy Introduction to the course 2
Learning objectives After this lecture, the students will be able to • Explain the work of a Domain Scientist to provide solutions to societal challenges • Explain how data science contributes to smooth the transition of the energy system to high share of renewables • Explain what research data and research data cycle are • Explain how Domain Scientists, Data Scientists and Data Stewards collaborates 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Outline 1. The Domain Scientist 2. Societal challenges 3. The Research project lifetime cycle 4. The research data cycle 5. The pathway to accomplish a research project 6. Meet the Team, the Data Steward and the Data Scientist 7. This week 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course 4
Domain Scientist Domain scientists are Domain expert and the main drivers of a research project cycle from idea generation to result interpretation and solutions. We are Domain Scientists Never stop to work 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Addressing Societal Challenges Secure & Safe Clean Energy Flow Data Science Big data Challenges Sustainable economic growth within Clean Environment Climate change Energy system digitalization 08 April 2019 DTU Wind Energy Ongoing Energy System Transformation Transitions Increasing shares of Renewables This course is for future energy domain scientists exploring the opportunities offered by the digital transformation. A Safe and stable Energy flow is one of the societal challenges for the sustainable economic growth. Ongoing transformations of the energy system, need innovative solutions to allow smooth transitions. Increasing shares of Renewables This needs Data Scientist to be equipped with new skills to drive research projects. Introduction to the Data Science essentials course
Research and data cycle Research project Idea Domain scientist Proposal writing Domain scientist 80 20 Research project Research products Partners Innovation 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Research and data cycle Research project Idea Domain scientist Proposal writing Domain scientist 80 20 Research project Research products Partners Innovation 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Pathway of a Research Project Grant starts Baseline analysis Find/ Generate data Quality Control Analysis of patterns/ Extract insights Data Analytics Store data 08 April 2019 DTU Wind Energy Presentation of results Introduction to the Data Science essentials course
Pathway of a Research Project, no good planning: Oh! Grant starts Domain scientist 80 -20 Baseline analysis No Good data Generate data Experiment delayed/ Models fail Outliers Data Analytics Stress Quality Control What is this? Analyze results Change Job 100 TB Data Store data Presentation of results 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Meet the team: Domain and Data Scientist and Data Steward A Data Steward to manage and curate data A Data Scientist with a data science toolbox: 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course 11
Summary Data science essentials for Domain scientists to thrive in the digitalization era. • Data Management • Data Science tool box Data -Value of data -Data management plan - FAIR Data Day 1 08 April 2019 DTU Wind Energy Data creation & stewardship - Experimental design - Acquisition - Formatting Day 2 Data preparation - Feature extraction - Similarity measure - Summary statistics - Data visualization - Data modelling Evaluation (insights) Classification Regression Clustering Density estimation - Anomaly detection - Decision making - Result visualization - Dissemination Results Day 3, 4, 5 Introduction to the Data Science essentials course
Data Science essential Introduction Part 2: from Open Data to Open Innovation 24 March 2019 DTU Wind Energy Introduction to the course 13
From Open Data to Open innovation A journey into the opportunities by the digital transformation Anna Maria Sempreviva
OUTLINE q. Answering the societal challenges, Innovation and Disrupting innovation. q. Disrupting Innovation and transition periods q. A journey through “history”: Is Big data a new concept? q. Connectivity, digitization, digitalization, Science 2. 0, Industry 4. 0 q. Data science the missing link q. Open data, big data Find the data q. From Open Science to Open Innovation 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
RESEARCH, INDUSTRY AND SOCIETAL CHALLENGES Innovations Secure & Safe Clean Energy Flow Data Science Sustainable economic growth In the frame of Challenges Clean Environment Climate change Ongoing Energy System Transformation Transitions Big data Increasing shares of Renewables Energy system digitalization 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
What is Innovation? and disrupting innovation? Innovation: a new idea leading to - new products, - processes or - services – or business models Disruptive innovation displaces an established technology and shakes up the industry or a ground-breaking product that creates a completely new industry. that meet new requirements, unarticulated needs, or existing market needs. 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
What is Innovation? and disrupting innovation? Transition period Fossil fuels grid 08 April 2019 DTU Wind Energy Renewables Telephone Smart phone Horse cart Car, train Introduction to the Data Science essentials course
A MODEL FOR TECHNOLOGICAL GROWTH AND TRANSITION. Foresee trends to understand actions for mastering innovation of transition periods. Socio-Technical landscape developments put a pressure • modulates and allows the transition path when conditions are ready or some “tipping point” occurs. A socio-technical regime is stable until an innovative concept emerges bottom up (niches-level) and become disruptive. • Often with the occurrence of “tipping point” events. • Converging concurred technologies/science advancement materializes. Transition period Niche innovation level. Technological niche develops and eventually converging becoming disruptive. The “Three levels” Model for technological growth and transition management: (Gbikpi, Grote 2002) 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course 19
I had a journey into “history”: transitions Born 10 years after the end of World War 2 I have experienced 8 socio-economic- techno transitions • To Europe • To analog communication for all (Phone and TV) • To Renewable energies (1970 s) • To PC • To Connectivity • To Digital Communication • To Digitization - Digitalization • To Open Science • More? 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
I had a journey into “history”: transitions My Professional Transitions Born 10 years after the end of World War 2 I have experienced 8 socio-economic- techno transitions: • To Europe To Consultant in 1999: Large Danish offshore Wind Farms To Management in 2009 • Head of Section at National Council of Research of Italy, CNR • To analog communication for all (Phone and TV) • To Renewable energies • To PC • To Connectivity To the future, 2012 -2014 • Coordinator of the Energy Theme Workgroup of CNR Science & Technological Foresight • To Digital Communication • To Digitalization • To Open Science To Librarianship in transition to Data Stewardship • Coursera Certifications in Research Data Management and Sharing • DTU FAIR DATA ambassador • More? 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
A journey into history, what did I learnt? Awareness of data growth and storage need & Technologies and jobs disappear Year Statements 1941 1 st attempt of quantifying the “volume of data/Information Explosion (Oxford dictionary) Big data is data that we do not know that exist (Anonymous, 2017) 1956 IBM Invented the hard disk 1978 Fourth IEEE Symposium on Mass Storage Systems, in which he says— ‘Data expands to fill the space available’…. users have no way of identifying obsolete data 1978 – 1980 Master thesis : replicated X 100 punch cards FROM PAPER TO PAPER X 08 April 2019 DTU Wind Energy X X Introduction to the Data Science essentials course
A journey into history, what have I learnt? The concept of Big Data is not new, it is relative 1980 - 1983 • 1 st task at CNR: developing the national meteo-climate database 08 April 2019 DTU Wind Energy • Multi-dimensional, distributed data! • Different media, now obsolete Introduction to the Data Science essentials course
24 What made the rise of data science? Connectivity Digitization - Digitalization Digitization: the process of transforming resources from physical into a digital format i. e. DATA Digitalization: to use digital technologies transforming data into information and insights then into value to find innovative: Products – services – business models. DATA SCIENCE the missing link 08 April 2019 DTU Wind Energy Digitization connectivity and marketing Shruti Dubey Slide. Share 2016 Introduction to the Data Science essentials course
25 Connectivity Open science FAIR DATA Connectivity enable collaboration and sharing. Open Science Openness Open Science policy. 2015 European Commission public consultation "Science 2. 0: Science in transition” Wilkinson et al. 2016 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
26 Open Science objectives in the data perspective: Share digital objects to multiply their value 3 main type of digital objects: • data sets, • software, and • workflows 08 April 2019 DTU Wind Energy Taxonomies and metadata to manage the wind energy digital transformation Introduction to the Data Science essentials course
27 Connectivity Big data & Data management FIND THE DATA we need to organize data to be ready for data analytics 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Let’s organize energy big data. Metadata catalogue for controlled data sharing • Big data is spread in the “Networks” – @different organizations (public and private) – with different standards, – on different supports Distributed data • Not documented ( no Metadata) • multi-disciplinary, multi-sectoral • IP Protected 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Let’s organize energy big data. Metadata catalogue for controlled data sharing TAG ASSETS (digital objects) Metadata & Taxonomies Expose catalogues to web crawlers Share data 08 April 2019 DTU Wind Energy Populate catalogues of assets Be visible Open to collaborations Introduction to the Data Science essentials course
Taxonomy and metadata, connecting stakeholders. Meet the data owner and the data user Data owner /creator • Makes visible data via metadata, and related terms from standard vocabularies • No uploading data • Can maintain control on data access 08 April 2019 DTU Wind Energy Data Market Place? € £ $? Services? Co-creation? Collaboration? Data user • Can find data by searching the same terms as used by the data owner • Can retrieve information on available data • Increases work efficiency Introduction to the Data Science essentials course 30
31 From Open Science to Open Innovation via open data Visible data, Work efficiency, connecting stakeholders fast track from research to innovation Example: United Genomes Project - Openness Accelerating Science Innovation funnels New product, processes, services Data Niches convergence level Data science Insights 08 April 2019 DTU Wind Energy Ideas Niches Niche ideas from different sectors are available to all. Introduction to the Data Science essentials course
The research data lifecycle from a Data Science perspective • Let’s start with being FAIR Data -Value of data -Data management plan FAIR data - Data Structrures Data creation & stewardship - Experimental design - Acquisition - Formatting - Storage and management Data preparation - Feature extraction - Similarity measure - Summary statistics - Data visualization - Data modelling Evaluation (insights) Classification Regression Clustering Density estimation - Anomaly detection - Decision making - Result visualization - Dissemination Results Day 1 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course
Summary • A journey into crucial “buzz words”, concepts: connectivity, digitization, big data, data science, digitalization chained enabled the digital transformations through innovation • Researchers must be able to foresee trends to understand actions needed for mastering innovation of transition periods. • The transformation to Science 2. 0 is a new way to do research: Open Science & FAIR principles supporting collaboration. • Open science supports the free exchange of idea from different sectors, disciplines enabling merging technological niches that can bring a fast path from research to innovation. 08 April 2019 DTU Wind Energy Introduction to the Data Science essentials course 33