Introduction to e Science and Semantic Web Professor

Introduction to e. Science and Semantic Web Professor Deborah Mc. Guinness TA – Katie Chastain Other lectures from tetherless world grad students Jim Mc. Cusker and Amar Viswanathan and possibly others from http: //tw. rpi. edu/web/People CSCI 6962 - 01, 26868 , CSCI 4969 - 01, 27716 ITWS 6960 - 01, 27640 , ITWS 4969 - 01, 27717 1 Week 1, August 27, 2012

Admin info (keep/ print this slide) • Class: – CSCI 6962 - 01, 26868 , CSCI 4969 - 01, 27716 – ITWS 6960 - 01, 27640 , ITWS 4969 - 01, 27717 • Hours: 1 pm-3: 50 pm Mondays (except after Columbus day when we meet on Tuesday) • Class Location: Winslow 1140 • Instructors: Deborah Mc. Guinness, TA Katie Chastain, Guests: Jim Mc. Cusker, Amar Viswanathan, Patrice Seyed • Contacts: dlm@cs. rpi. edu, chastk@rpi. edu , mccusj@rpi. edu , kannaa@rpi. edu, seyeda 2@rpi. edu • Contact locations: Winslow 2104 (DLM), 2 nd floor Winslow kitchen • Wiki: http: //tw. rpi. edu/web/Courses/Semantice. Science/2012 • Twed: http: //tw. rpi. edu/web/TWed - 7 -9 starting Sept 12 2

Introductions • • • Who are we? Who are you? Why are you here? What do you want to get out of the class? Will you make the class (on time) each week and do you have any other conflicts or issues we should know about? 3

“Knowledge is the common wealth of humanity”* In the Earth and space sciences and elsewhere, ready and open access to the vast and growing collections of cross-disciplinary digital information is the key to understanding and responding to complex Earth system phenomena that influence human survival. We have a shared responsibility to create and implement strategies to realise the full potential of digital information and services for present and future generations. *Adama Samassekou, Convener of the UN World Summit on the Information Society

Background People should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple means, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable 5 implementation technology…

What do we need to achieve Semantic e. Science? (in-class brainstorming exercise) White board exercise…. What do we need to achieve this vision?

What do we need to achieve Semantic e. Science? (in-class brainstorming exercise) organization, leadership, management strategies, roles and assignment of roles dissemination strategy communication of ideas - machine level - human level conflict resolution cross-disciplinary collaboration flexible adaptable, feedback extensible ability to filter information usage/application of resources, optimization facts, knowledge (domain knowledge) context, domain, scope goals, use cases metadata - data to describe data ability to link information ability to understand information ability to capture and represent conflicting ideas provenance - where data come from trust - reliable ability to capture intent (humanitarian aspect / responsibility) credibility of information interesting and appealing standardization education and outreach methods and metrics criteria for evaluation

Outline of the course • Topics for Semantic e-Science/ Foundations: – – – – Semantic Methodologies Knowledge Representation for e-Science Ontology Engineering and Re-Use for e-Science Knowledge Integration for e-Science Semantic Data Integration Semantic Web Languages, Tools and Services Knowledge Provenance for e-Science Semantic Infrastructure and Architecture for e-Science Semantic Grid Middleware Ontology Evolution for e-Science Knowledge Management for e-Science Workflow Management Data life-cycle for e-Science 8

Contents • • • Outline of the course Background e-Science Examples Informatics Semantics Elements of Semantic e-Science (Se. S) What we expect Logistics summary 9

The Information Era: Interoperability Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system: • managing and accessing large data sets • higher space/time resolution capabilities • rapid response requirements • data assimilation into models • crossing disciplinary boundaries. 10

Information But data has products have Lots of Audiences More Strategic Less Strategic SCIENTISTS TOO From “Why EPO (Education and Public Outreach)? ”, a NASA internal report on science education, 2005 11

Shifting the Burden from the User to the Provider 12 Fox CI and X-informatics - CSIG 2008, Aug 11

e-Science • Emphasis is on Science • Original narrative: One of the key drivers behind the search for such new scientific tools is the imminent deluge of data from new generations of scientific experiments and surveys (*). In order to exploit and explore the petabytes of scientific data that will arise from these high-throughput experiments, supercomputer simulations, sensor networks, and satellite surveys, scientists will need assistance from specialized search engines, data mining tools, and data visualization tools that make it easy to ask questions and understand answers. To create such tools, the data will need to be annotated with relevant "metadata" giving information as to provenance, content, conditions, and so on; and, in many instances, the sheer volume of data will dictate that this process be automated. Scientists will create vast distributed digital repositories of scientific data requiring management services similar to those of more conventional digital libraries, as well as other data-specific services. The ability to search, access, move, manipulate, and mine such data will be a central requirement for this new generation of collaborative science software applications. Hey and Trefethen, 2005 13

Evolving Science • Thousand years ago: science was empirical describing natural phenomena • Last few hundred years: theoretical branch using models, generalizations • Last few decades: a computational branch simulating complex phenomena • Today: data exploration (e. Science) synthesizing theory, experiment and computation with advanced data management and statistics new algorithms! • e. Science that “understands” meaning of terms Semantic e. Science

Living in an Exponential World • Scientific data doubles every year – caused by successive generations of inexpensive sensors + exponentially faster computing • • Changes the nature of scientific computing Cuts across disciplines (e. Science) It becomes increasingly harder to extract knowledge 20% of the world’s servers go into huge data centers by the “Big 5” – Google, Microsoft, Yahoo, Amazon, e. Bay • So it is not only the scientific data!

Collecting Data • Very extended distribution of data sets: data on all scales! • Most datasets are small, and manually maintained (Excel spreadsheets) • Total amount of data dominated by the other end (large multi-TB archive facilities) • Most bytes today are collected via electronic sensors

Making Discoveries • Where are discoveries made? – At the edges and boundaries, by inspecting deeper or more data • Metcalfe’s law – Utility of computer networks grows as the number of possible connections: O(N 2) • Federating data (the connections!!) – Federation of N archives has utility O(N 2) – Possibilities for new discoveries grow as O(N 2) • Many examples – Sky surveys – galaxy zoo… Very early discoveries from Sloan Digital Sky Survey (http: //www. sdss. org/ ), Two Micron Sky Survey (http: //www. ipac. caltech. edu/2 mass/ ) , Palomar Digital Sky Survey (http: //www. astro. caltech. edu/~george/dposs/ ) – Genomics+proteomics – Alzheimers article in reading

Data Delivery: Hitting a Wall FTP and GREP are not adequate • • You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years • Oh!, and 1 PB ~4, 000 disks • • You can FTP 1 MB in 1 sec You can FTP 1 GB / min (~1 $/GB) … 2 days and 1 K$ … 3 years and 1 M$ • At some point you need indices to limit search parallel data search and analysis • This is where databases can help • Take the analysis to the data!!

Mind the Gap! • As a result of finding out who is doing what, Ø Informatics - information science includes the sharing experience/ expertise, and science of (data and) information, the practice of substantial coordination: information processing, and the engineering of • There is/ wassystems. still a gap between science information Informatics studies the and the underlying and of natural structure, behavior, infrastructure and interactions technology that is available and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual theoretical • Cyberinfrastructure is the new and research environment(s) that support advanced data and foundations. Since computers, individuals acquisition, dataallstorage, management, organizations processdata information, data integration, mining, data informatics has data computational, cognitive and visualization and other computing and social aspects, including study over of thethe social information processing services impact of information technologies. Wikipedia. Internet. 19

Progression after progression Informatics IT Cyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, Societal Benefit Areas 20

World-Wide Emerging Technology Trends • Innovation will come from other parts of the world other than the U. S. • The Chinese have skipped the Internet first generation. • Growth is occurring in Asia, and decreasing in previous hot areas such as Western Europe. • U. S. Industry is compulsively outsourcing abroad. • Software is moving from forms-based applications to business processes. • Networks are migrating to internet protocol and optical networking technologies.

Cyberinfrastructure • • • Data curation and storage Federated access Collaboration New uses in High Performance Computing Databases Web servers, services (software as service) Wiki Visualization All discipline neutral

Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, etc. Adopt Leverage Rapid Technology Science/Expert Open World: Prototype Infrastructure Approach Review & Iteration Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 23

Ex. 1: Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i. e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part Information Technology (IT), part Cyber 24 Infrastructure (CI), part Informatics and all about doing new science

Semant. Eco • Water Quality Portal Example from previous classes • http: //inferenceweb. org/wiki/Semantic_Water_Quality_Portal • We will come back to this later… but will go over now at a high level. • Next Motivated by the Virtual Solar Terrestrial Observatory 25

Added value Education, clearinghouses, other services, disciplines, et c. Semantic mediation layer - midupper-level Virtual Observatory Portal Semantic interoperability Added value VO API Web Serv. Added value Query, access and use of data Semantic query, hypothesis and inference Mediation Layer • Ontology - capturing concepts of Parameters, Semantic mediation layer - VSTO Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Metadata, schema, • Maps queries to underlying data • Generates access requests for metadata, data Added value DB 2 DB 3 • Allows. DBqueries, reasoning, analysis, new ………… hypothesis generation, testing, explanation, etc. 1 low level Data Base n 26

Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100 km and toward the arctic circle (above 45 N) at any time of high geomagnetic activity. – Extract information from the use-case - encode knowledge – Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a Simple Object Access Protocol (SOAP) web service for the Virtual Ionosphere-Thermosphere. Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter 27 in any order and with constraints included in any combination.

Inferred plot type and return required axes data 28

Semantic Web Benefits • Unified/ abstracted query workflow: Parameters, Instruments, Date-Time • Decreased input requirements for query: in one case reducing the number of selections from eight to three • Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics • Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) • Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services – understanding of coordinate systems, relationships, data synthesis, transformations, etc. – returns independent variables and related parameters • A broader range of potential users (Ph. D scientists, students, professional research associates and those from outside the fields) 29

Remembering…. data has Lots of Audiences… Also lay people More Strategic Less Strategic 30

What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge 31

What should the User Receive? Teacher receives four groupings of search results: 1) Educational materials: http: //www. meted. ucar. edu/topics_spacewx. ph p and http: //www. meted. ucar. edu/hao/aurora/ 2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them 3) Did you know? : Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean? : Aurora Borealis or Aurora Australis, etc. 32

Semantic Information Integration: Concept map for educational use of science data in a lesson plan 33 Fox CI and X-informatics - CSIG 2008, Aug 11

34 Fox CI and X-informatics - CSIG 2008, Aug 11

Semantic Web Basics • The triple: {subject-predicate-object} Interferometer is-a optical instrument Optical instrument has focal length An ontology is a representation of this knowledge • W 3 C is the primary (but not sole) governing organization for languages, specifications, best practices, et c. – RDF - Resource Description Framework – OWL 1. 0 - Ontology Web Language (OWL 2. 0 on the way) • Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon • Put semantics between/ in your interfaces, i. e. between layers and components in your architecture, i. e. between ‘users’ and ‘information’ to mediate the exchange 35

• • • Terminology Semantic Web – An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www. ics. forth. gr/isl/swprimer/ – Primer: http: //www. ics. forth. gr/isl/swprimer/ Semantic Grid – Semantic services to use the resources of many computers connected by a network to solve large scale computational/ data problems Provenance – origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility. Service-oriented architecture – Provision of a capability over the internet via a ‘remote-procedure-call’ using prescribed input, output and pre-conditions Ontology (n. d. ). The Free On-line Dictionary of Computing. http: //dictionary. reference. com/browse/ontology – An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the 36 relationships that hold among them.

• • • Terminology Closed World - where complete knowledge is known (encoded), AI relied on this Open World - where knowledge is incomplete/ evolving, SW promotes this Languages – – – – • OWL - Web Ontology Language (W 3 C) RDF - Resource Description Framework (W 3 C) OWL-S/SWSL - Web Services (W 3 C) WSMO/WSML - Web Services (EC/W 3 C) SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format PML - Proof Markup Language Editors: Protégé, SWOOP, Medius, SWe. DE, … Reasoners – Pellet, Racer, Medius KBS, FACT++, fuzzy. DL, KAON 2, MSPASS, Qu. Onto • Query Languages – SPARQL, XQUERY, Se. RQL, OWL-QL, RDFQuery • Other Tools for Semantic Web – – • Search: SWOOGLE swoogle. umbc. edu Collaboration: www. planetont. org Other: Jena, Se. SAME/SAIL, Mulgara, Eclipse, KOWARI Semantic wiki: Onto. Wiki, Semantic. Media. Wiki Emerging Semantic Standards for Earth Science – SWEET, VSTO, MMI, Geo. Sci. ML 37

Semantic Web Layers 38 http: //www. w 3. org/2003/Talks/1023 -iswc-tbl/slide 26 -0. html, http: //flickr. com/photos/pshab/291147522/

Application Areas for Semantics • • • • Smart search Annotation (even simple forms), smart tagging Geospatial Implementing logic (rules), e. g. in workflows Data integration Verification …. and the list goes on Web services Web content mining with natural language parsing User interface development (portals) Semantic desktop Wikis - Onto. Wiki, Semantic. Media. Wiki Sensor Web Software engineering Explanation 39

Visibility 2007 -2008 Hype Cycle for Emerging Semantic Web Technologies v 0. 6 Semantic Web Services Semantic Wiki Smart search, e. g. NOESIS Rules/Logic, SWRL Query Lang, SPARQL Tagging / annotation Triple stores, e. g. Jena, Sesame, Mulgara, Oracle Spatial Ontology editor, SWOOP Mid-level ES domain ontologies, e. g GEON Concept map, Cmap OWL 1. 0 RDF Protégé XML Estimated years to mainstream adoption in Earth science < 2 years DL Reasoners, 2 -5 years SKOS, e. g. Pellet, Racer Species Query 5 -10 years FOAF Validators Lang, Upper level Mid-level ES OWL 1. 1 OWL-QL > 10 years ontologies, e. g domain ontologies, Natural Language Obsolete ABC, DOLCE, e. g SWEET before Ontologies SUMO plateau Query Lang, Commercial Managing and embedded QL modular 40 Slope of Plateau of ontologies Technology Peak of Trough of Enlightenment Productivity (ES and trigger Inflated Disillusionment general) Expectations Produced for NASA TIWG semantic web subgroup Time

April 2008 Outcome Increased Collaboration & Interdisciplinary Science Acceleration of Knowledge Production Revolutionizing how science is done Output Geospatial semantic services established Geospatial semantic services proliferate Scientific semantic assisted services Autonomous inference of science results Vocabulary Interoperable Information Infrastructure Assisted Discovery & Mediation Improved Information Sharing Languages/ Reasoning Technology Capability Results Semantic Web Roadmap Some common vocabulary based product search and access Semantic geospatial search & inference, access Semantic agentbased searches Semantic agentbased integration Local processing + data exchange Basic data tailoring services (data as service), verification/ validation t. Interoperable geospatial services (analysis as service), results explanation service Metadata-driven data fusion (semantic service chaining), trust SWEET core 1. 0 based on GCMD/CF SWEET core 2. 0 based on best practices decided from community RDF, OWL-S Geospatial reasoning, OWL-Time SWEET 3. 0 with semantic callable interfaces via standard programming languages Numerical reasoning Reasoners able to utilize SWEET 4. 0 Scientific reasoning 41 Current Near Term (0 -2 yrs) Mid Term (2 -5 yrs) Long Term (5+ yrs)

Assisted Interactive Interoperable Responsive Verifiable Assisted Data Information Knowledge Discovery & Mediation Analysis services Delivery Quality Building Seamless Data Access Capability Semantic Web Roadmap (capability) April 2008 Some common vocabulary based product search and access Some metadata and limited provenance available Semantic geospatial search & inference, access Ontologies for data mining, visualization and analysis emerging/ maturing Ontologies for information quality developed Verification is manual with minimal tool support Semantic agentbased searches Semantic agent-based integration Common terminology captured in ontologies, crossing domains Domain and range properties in ontologies used in tools Provenance/ annotation with ontologies in user tools Service ontologies carry quality provenance Services annotated Dynamic service Semantic markup of Services must be with resource discovery and mediation, data latency (time lags) hardwired and service descriptions and data scheduling which adapt dynamically agreements established Local processing + data exchange Limited metadata passed to analysis applications Basic data tailoring t Interoperable geospatial services (data as (analysis as service), verification/ results explanation service validation Tag properties, nonjargon vocabulary for non-specialist use Access mediated by agreed standard vocabularies, hard-wired connections Current Access mediated by common ontologies Near Term (0 -2 yrs) Shared terminology for the visual properties of interface objects and graph types. . . Mediation aided by services with domain/ range properties Mid Term (2 -5 yrs) Metadata-driven data fusion (semantic service chaining), trust Semantic fields to describe tag key modal functions. Key data access services are 42 semantically mediated Long Term (5+ yrs)

Assisted Interactive Interoperable Responsive Verifiable Assisted Data Information Knowledge Discovery & Mediation Analysis services Delivery Quality Building Seamless Data Access Capability Roadmap - from near-term to mid-term Semantic geospatial search & inference, access -> requires agent development and vocabulary for agent characterization -> requires mature (domain Ontologies for data and data-type) ontologies with mining, visualization and community endorsement and analysis emerging/ maturing governance and a robust integration framework -> requires mature quality and Ontologies for uncertainty ontologies with information quality domain and range properties developed added and populated Services annotated with resource descriptions Basic data tailoring services (data as service), verification/ validation Tag properties, nonjargon vocabulary for non-specialist use Access mediated by common ontologies Near Term (0 -2 yrs) -> requires semantic service (ontology) registry -> requires service to implement v/v, new descriptions of analyses, developing explanation -> requires development of portal modal function vocabulary and ontology, link to domain context and data structure -> requires adding properties to classes in ontologies and populating instances with expert agreement Semantic agentbased searches Common terminology captured in ontologies, crossing domains Domain and range properties in ontologies used in tools Dynamic service discovery and mediation, and data scheduling t Interoperable geospatial services (analysis as service), results explanation service Shared terminology for the visual properties of interface objects and graph types. . . Mediation aided by services with domain/ range properties Mid Term (2 -5 yrs) 43

Selected Technical Benefits 1. 2. 3. 4. 5. 6. 7. 8. Integrating Multiple Data Sources Semantic Drill Down / Focused Perusal Statements about Statements Inference Translation Smart (Focused) Search Smarter Search … Configuration Proof and Trust Updated material reused from “The Substance of the Web”. Mc. Guinness and Dean. Semantic Web Applications for National Security. May, 2005. http: //www. schafertmd. com/swans/agenda. html 44

1: Integrating Multiple Data Sources • The Semantic Web lets us merge statements from different sources • The RDF Graph Model allows programs to use data uniformly regardless of the source • Figuring out where to find such data is a motivator for Semantic Web Services has. Coordinates #Ionosphere #magnetic name has. Lower. Boundary. Value “ 100” “Terrestrial Ionosphere” has. Lower. Boundary. Unit “km” Different line & text colors 45 represent different data sources

2: Drill Down /Focused Perusal • The Semantic Web uses Uniform Resource Identifiers (URIs) to …#Neutral. Temperature name things • These can typically be resolved to get more information about the resource measuredby • This essentially creates a web of data analogous to the web of text created by the World Wide Web Internet • Ontologies are represented using the same structure as content – We can resolve class and property URIs to learn about the ontology …#Norway located. In . . . #ISR. . . #FPI type operatedby. . . #Milllstone. Hill …#EISCAT 46

3: Statements about Statements • The Semantic Web allows us to make statements about statements – Timestamps – Provenance / Lineage – Authoritativeness / Probability / Uncertainty – Security classification – … • This is an unsung virtue of the Semantic Web #Danny’s #Aurora has. Source has. Date. Time 20031031 hascolor Red Ontologies Workshop, APL May 26, 2006 47

4: Inference • The formal foundations of the Semantic Web allow us to infer additional (implicit) statements that are not explicitly made • Unambiguous semantics allow question answerers to infer that objects are the same, objects are related, objects have certain restrictions, … • SWRL allows us to make additional inferences beyond those provided by the ontology Operates. Instrument #Millstone Hill #Interferometer has. Instrument is. Operated. By Measures has. Typeof. Data has. Operating. Mo has. Meaasured. Data #Vertical. Means 48

5: Translation • While encouraging sharing, the Semantic Web allows multiple URIs to refer to the same thing • There are multiple levels of mapping – – Classes Properties Instances Ontologies • OWL supports equivalence and specialization; SWRL allows more complex mappings #precipitation name ont 1: Precipitation ont 1: Edu. Level VO: Scientist #precipitation name ont 2: Rain ont 2: Edu. Level Edu. VO: K-12 49

6: Smart (Focused) Search • The Semantic Web associates 1 or more classes with each object • We can use ontologies to enhance search by: – – Query expansion Sense disambiguation Type with restrictions …. 50

7: Smarter Search / Configuration 51

GEONGRID Ontology Search and Data Integration Example Uses emerging web standards to enable smart web applications Given an upper-level domain choice • Ecology Illustrate or list contained concepts/hierarchy • Vegetation. Cover, Tree. Rings, etc. Retrieve some specific options from web • Maps, tree-ring data, • Info: https: //portal. geongrid. org: 8443/gridsphere 52

53

54

8: Proof • The logical foundations has. Calibration #Critical of the Semantic Web #Flat. Field Dataset allow us to construct proofs that can be used has. Peer. Review to improve transparency, understanding, and trust #Solar Physics • Proof and Trust are on. Paper going research areas for the Semantic Web: e. g. , “Critical Dataset has been calibrated See PML and Inference with a flat field program that is published In the peer reviewed literature. ” 55 Web

Inference Web Framework for explaining reasoning tasks by storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by multiple distributed reasoners. • OWL-based Proof Markup Language (PML) specification as an interlingua for proof interchange • IWExplainer for generating and presenting interactive explanations from PML proofs providing multiple dialogues and abstraction options • IWBrowser for displaying (distributed) PML proofs • IWBase distributed repository of proof-related meta-data such as inference engines/rules/languages/sources • Integrated with theorem provers, text analyzers, web services, … http: //iw. rpi. edu 56

Inference Web Infrastructure (Mc. Guinness, et. al. , 2004 http: //www. ksl. stanford. edu/KSL_Abstracts/KSL-04 -03. html ) Files/WWW Semantic OWL-S/BPEL Discovery Service (DAML/SNRC) CWM (NSF TAMI) JTP (DAML/NIMD) SPARK (DARPA CALO) N 3 KIF SPARK-L UIMA (DTO NIMD Text Analytics Exp Aggregation) Proof Markup Language (PML) Trust Justification Provenance Toolkit IWTrust computation IW Explainer/ Abstractor End-user friendly visualization IWBrowser Expert friendly Visualization IWSearch search engine based publishing IWBase provenance registration Framework for explaining question answering tasks by • abstracting, storing, exchanging, • combining, annotating, filtering, segmenting, • comparing, and rendering proofs and proof fragments provided by question answerers. 57

SW Questions & Answers Users can explore extracted entities and relationships, create new hypothesis, ask questions, browse answers and get explanations for answers. A question An answer A context for explaining the answer An abstracted explanation 58 (this graphical interface done by Batelle supported by Stanford KSL)

Summary • Semantics are a very key ingredient for progress in informatics and escience • A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production • This is what we will be teaching you in this class 59

Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Rapid Leverage Open World: Prototype Technology Evolve, Iterate, Infrastructure Redesign, Redeploy Adopt Technology Science/Expert Approach Review & Iteration Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 60

Outline of the course • Topics for Semantic e-Science/ Foundations: – – – – Semantic Methodologies Knowledge Representation for e-Science Ontology Engineering and Re-Use for e-Science Knowledge Integration for e-Science Semantic Data Integration Semantic Web Languages, Tools and Services Semantic Infrastructure and Architecture for e-Science Semantic Grid Middleware Ontology Evolution for e-Science Knowledge Management for e-Science Workflow Management Data life-cycle for e-Science Data Mining and Knowledge Discovery 61

Se. S Applications and Ontologies • • Semantic Web for Health Care and Life Science Semantic Web for Bio-Med-informatics Semantic Web for System and Integrated Biology Semantic Web for Sun, Earth, Environment and Climate • Semantic Web for Chemistry, Physics and Astronomy • Semantic Web for Engineering • Semantic Web and Digital Libraries and Scientific Publications 62

Se. S Project options • Configuration and Deployment of Semantic Virtual Observatories – Oceanography, astronomy, geology – Particularly convenient ones – around water quality, first responder data • • • Semantic Advisors – e. g. , Semantic Sommelier Ontology Merging and Validation Test-bed Semantic Language and Tool Use and Evaluation Semantic e. Science Implementation Evaluation Semantic Collaboration Case Studies Semantic Application Development and Demonstration 63

Schedule - wiki • Reading assignments • Assignments – Individual – Group • Written assessments • Presentation assessments • Group assessments 64

What we expect • Attend class, complete assignments • Participate • Ask questions – be honest with yourself and others about what you do and do not know • Work both individually and in a group • Work constructively in group and class sessions 65

Logistics summary • Class - Monday 1 -3: 50 pm • Office hours – By Appointment along with a regular time to be determined for TA (probably before and tetherless night – Twed) • This weeks assignment: – Reading - Ontologies 101*- this one is very important, Semantic Web, e-Science, RDFS – Turn in a one page description of one of your favorite papers AND WHY from the reading list • Next class (week 2 – two weeks from today - note labor day): – Foundations I: Methodologies, Knowledge Representation – Use Cases • If you have a background that you think needs some extra 66 background reading, talk to us.

Extra 67
- Slides: 67