Bringing Data Science Xinformatics and Semantic e Science
Bringing Data Science, Xinformatics and Semantic e. Science into the Graduate Curriculum (solicited) EGU 2012 -11224 (EOS 6/ ESSI 2. 3) April 25, 2012, Vienna Peter Fox (RPI) pfox@cs. rpi. edu Tetherless World Constellation
tw. rpi. edu Themes Future Web • Web Science • Policy • Social Xinformatics • Data Science • Semantic e. Science • Data Frameworks Hendler Fox Mc. Guinness Semantic Foundations • Knowledge Provenance • Ontology Engineering Environments • Inference, Trust Multiple depts/schools/programs ~ 35 (Post-doc, Staff, Grad, Ugrad)
Govt. Data • Open • Linked • Apps Application Themes Env. Informatics • Ecosystems • Sea Ice • Ocean imagery • Carbon Hendler/ Erickson Fox Mc. Guinness/Luciano Platforms: Bio-nano tech center Exp. Media and Perf. Arts Ctr. Comp. Ctr. Nano. Innov. Data Intensive Health Care/ Life Sciences • Population Science • Translational Med • Health Records
Context http: //tw. rpi. edu/web/Courses Experience Data Creation Gathering Information Presentation Organization Knowledge Integration Conversation Data Science Xinformatics Semantic e. Science 4 Web Science
Also at RPI • Data Science Research Center and Data Science Education Center • http: //www. rpi. edu/about/inside/issue/v 4 n 17/dat acenter. html – Over 35 research faculty, 5 post-docs, ? grad students • Data is one of Rensselaer Plans’ five thrusts • Other key faculty – Fran Berman (VPR) – Jim Myers (Director CCNI)
Curriculum • Web Science and IT – undergrad, and MSc. and Ph. D. (with science concentrations) • Environmental Science with Geoinformatics concentration • Bio, geo, chem, astro, materials - informatics • GIS for Science • Master of Science – Data Science (pending) • Multi-disciplinary science program (2012) Ph. D in Data and Web Science
E. g. IT with Env. Sci. • • • ERTH-1200 Geology II (4 credits) - spring CHEM-2250 Organic Chemistry I (4 credits) - spring ERTH-2210 Field Methods (2 credits) - fall IENV-1920 Environmental Seminar (2 credits) - spring BIOL-2120 Intro. to Cell and Molecular Biology (4 credits) - spring IENV-4500 Global Environmental Change (4 credits) - fall ERTH-4180 Environmental Geology (4 credits) – spring ERTH-4963 Xinformatics (4 credits) – spring IENV-4700 One Mile of the Hudson River (4 credits) - fall
Geoinformatics concentration • CSCI 1000 - Computer Science I • CSCI 1200 - Data Structures • CSCI 2300 - Introduction to Algorithms or ERTH 4750 - Geographic Information Systems in the Sciences • CSCI 4380 – Databases • CSCI 4961 - Data Science • CSCI 4960 – Xinformatics • ERTH 4980 – Senior Thesis
Web Science Learning Objectives • Students will demonstrate knowledge and be able to explain the three different "named" generations of the web (a/k/a Web 1. 0, Web 2. 0, and Web 3. 0) from mathematical, engineering, and social perspectives • Students will demonstrate the ability to use the dynamic programming language Python to develop programs relating to Web applications and the analysis of Web data. • Students will be able to understand analyze key Web applications including search engines and social networking sites. • Students will be able to understand explain the key aspects of Web architecture and why these are important to the continued functioning of the World Wide Web. • Students will be able to analyze and explain how technical changes affect the social aspects of Web-based computing. • Students will be able to develop "linked data" applications using Semantic Web technologies.
Data Science Objectives • To instruct future scientist how to sustainably generate/ collect and use data for their research as well as for others: data science. • To instruct future technologists how to understand support essential data and information needs of a wide variety of producers and consumers • For both to know tools, and requirements to properly handle data and information • Will learn and be evaluated on the full lifecycle of data and relevant methods, 10 technologies and best practices.
Learning Objectives • Develop and demonstrate skill in data collection and management • Know how to develop and apply data models and metadata models • Demonstrate knowledge of data standards • Develop and demonstrate the application of skill in data science tool use and evaluation • Demonstrate the application of data life-cycle principles and data stewardship • Demonstrate proficiency in data and 11 information product generation
Xinformatics Objectives • To instruct future information architects how to sustainably generate information models, designs and architectures • To instruct future technologists how to understand support essential data and information needs of a wide variety of producers and consumers • For both to know tools, and requirements to properly handle data and information • Will learn and be evaluated on the underpinnings of informatics, including theoretical methods, technologies and best practices. 12
Learning Objectives • Through class lectures, practical sessions, written and oral presentation assignments and projects, students should: – Develop and demonstrate skill in development and management of multi-skilled teams in the application of informatics – Demonstrate ability to develop conceptual and logical information models and explain them to non-experts – Demonstrate knowledge and application of informatics standards – Demonstrate skill in informatics tool use and evaluation 13
Modern informatics enables a new scale -free framework approach
Semantic e. Science Objectives • Ontology Development, Merging and Validation • Semantic Language and Tool Use and Evaluation • Use Case Development and Elaboration • Semantic e. Science Implementation and Evaluation via Use Cases • Semantic Application Development and Demonstration • Group Project and Team Development, Use Case Implementation and Evaluation
Discussion… • Science and interdisciplinary from the start! – Not a question of: do we train scientists to be technical/data people, or do we train technical people to learn the science – It’s a skill/ course level approach that is needed • Education and research semi-coupled • We must teach methodology and principles over technology * • Data science must be a skill, and natural like using instruments, writing/using codes • Team/ collaboration aspects are key ** • Foundations and theory must be taught ***
Progression after progression Informatics IT Cyber Infrastru cture Cyber Informatics Core Informatics Science Informatics Requirements Science, Societal Benefit Areas Example: • CI = OPe. NDAP server running over HTTP/HTTPS • Cyberinformatics = Data (product) and service ontologies, triple store • Core informatics = Reasoning engine (Pellet), OWL 18 • Science (X) informatics = Use cases, science domain terms, concepts in an ontology
- Slides: 18