Knowledge Discovery in Grid Datasets Goals Design Concepts

  • Slides: 44
Download presentation
Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany

Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna P. Brezany University of Vienna

Collecting Data Laboratories Satellites Business Experiments (high energy physics, . . . ) P.

Collecting Data Laboratories Satellites Business Experiments (high energy physics, . . . ) P. Brezany (microscopes, MRI/CT scanners, . . . ) Analysis Data Repositories Computer simulations University of Vienna 2

Motivation • Computational Grid – a new-generation infrastructure • Challenge: Advanced analysis of data

Motivation • Computational Grid – a new-generation infrastructure • Challenge: Advanced analysis of data managed by Grid • Typical data in modern Grid applications: – files, file collections, relational and XML DBs, virtual data, data objects • The data is often is large, geographically distributed and its complexity is increasing; some applications require special security precautions. • Our research aims: – Phase 1 : Knowledge discovery Grid system (Grid. Miner) – Phase 2 : Intelligent Grid system (Wisdom. Grid) P. Brezany University of Vienna 3

 • Motivation Outline • Background and Related Work • Basic Concepts and Grid.

• Motivation Outline • Background and Related Work • Basic Concepts and Grid. Miner Architecture • Grid Data Integration System • Data Mining Layer • Implementation Issues and Experiments • Future Research P. Brezany • Conclusions University of Vienna 4

Background and Related Work • Basic Grid development (Globus 1) – metacomputing • Data

Background and Related Work • Basic Grid development (Globus 1) – metacomputing • Data Grid (Globus 2, Data. Grid of CERN, etc. ) • Semantic Grid (my. Grid) • Open Grid Service Architecture (Globus 3, OGSA-DAIS) • Parallel and Distributed Data Mining and Data Warehousing • Knowledge Grid (Grid. Miner and work of others) • Web Intelligence P. Brezany University of Vienna 5

Grid. Miner Requirements • Open architecture • Data distribution, complexity, heterogeneity, and large data

Grid. Miner Requirements • Open architecture • Data distribution, complexity, heterogeneity, and large data size • Applying different kinds of analysis strategies • Compatibility with existing Grid infrastructure • Openness to tools and algorithms • Scalability • Grid, network, and location transparency • Security and data privacy • OLAP support P. Brezany University of Vienna 6

Grid. Miner (Layered) Abstract Architecture User Interface Knowledge Grid Data to Knowledge Information Grid

Grid. Miner (Layered) Abstract Architecture User Interface Knowledge Grid Data to Knowledge Information Grid Control Computational & Data Grid Built on the K. G. Jeffery‘s proposal P. Brezany University of Vienna 7

Grid. Miner Conceptual Architecture J o b C o n t r o l

Grid. Miner Conceptual Architecture J o b C o n t r o l P. Brezany University of Vienna 8

Service Architecture Based on OGSA-DAIS P. Brezany University of Vienna 9

Service Architecture Based on OGSA-DAIS P. Brezany University of Vienna 9

Data Distribution Scenarios 1. Single data source 2. Federated data sources with different types

Data Distribution Scenarios 1. Single data source 2. Federated data sources with different types of partitioning P. Brezany University of Vienna 10

Example Vertical and horizontal distribution of the virtual data source P. Brezany University of

Example Vertical and horizontal distribution of the virtual data source P. Brezany University of Vienna 11

Mapping Schema P. Brezany University of Vienna 12

Mapping Schema P. Brezany University of Vienna 12

Grid Data Mediation Services P. Brezany University of Vienna 13

Grid Data Mediation Services P. Brezany University of Vienna 13

Architecture of a Data Mining System P. Brezany University of Vienna 14

Architecture of a Data Mining System P. Brezany University of Vienna 14

Components of the Data Mining Layer • Grid. Miner Service Factory • Grid. Miner

Components of the Data Mining Layer • Grid. Miner Service Factory • Grid. Miner Service Registry • Grid. Miner Data Mining Service • Grid. Miner Preprocessing Service • Grid. Miner Presentation Service • Grid. Miner Orchestration Service P. Brezany University of Vienna 15

Centralized Data Mining P. Brezany University of Vienna 16

Centralized Data Mining P. Brezany University of Vienna 16

Parallel and Distributed Data Mining P. Brezany University of Vienna 17

Parallel and Distributed Data Mining P. Brezany University of Vienna 17

Grid. Miner Orchestration Service P. Brezany University of Vienna 18

Grid. Miner Orchestration Service P. Brezany University of Vienna 18

Grid. Miner Job Specification Language P. Brezany University of Vienna 19

Grid. Miner Job Specification Language P. Brezany University of Vienna 19

Implementation Prototype • Implementation of the Mediation Service for horizontal data partitioning • Implementation

Implementation Prototype • Implementation of the Mediation Service for horizontal data partitioning • Implementation of Data Mining Services for decision tree construction as OGSA conformous Grid service, based on the Globus Toolkit 3 Release • We use – a freely available Java-based data mining system Weka (data preprocessing and data mining tasks) – (main memory oriented) – a home-grown Java implementation of the algorithm SPRINT (disk -oriented) P. Brezany University of Vienna 20

Experimental Environment • Test data suites – synthetical data (generated by an extended version

Experimental Environment • Test data suites – synthetical data (generated by an extended version of the IBM Quest Synthetic Data Generation Code) – TBI (Traumatic Brain Injury) databases • Grid testbed – – – Vienna CERN Dublin Zagreb Cracow • Goals in the first phases – Verifying model accuracy – Overhead of the service layers P. Brezany University of Vienna 21

Extending the Functionality P. Brezany University of Vienna 22

Extending the Functionality P. Brezany University of Vienna 22

OLAM P. Brezany University of Vienna 23

OLAM P. Brezany University of Vienna 23

Example: Mining Patterns for Data Classification and Associations use database dat 1, dat 2

Example: Mining Patterns for Data Classification and Associations use database dat 1, dat 2 mine classifications analyze patient_outcome using g_parsimony display as tree P. Brezany use database DBs attributes mine associations using method_attributes display as rules University of Vienna 24

Workflow 1: Interactive Mode P. Brezany University of Vienna 25

Workflow 1: Interactive Mode P. Brezany University of Vienna 25

Workflow 2: Batch Mode P. Brezany University of Vienna 26

Workflow 2: Batch Mode P. Brezany University of Vienna 26

Workflow 3: Hybrid Mode P. Brezany University of Vienna 27

Workflow 3: Hybrid Mode P. Brezany University of Vienna 27

Execution Model Based on Static Workflow P. Brezany University of Vienna 28

Execution Model Based on Static Workflow P. Brezany University of Vienna 28

Execution Model Based on Dynamic Workflow P. Brezany University of Vienna 29

Execution Model Based on Dynamic Workflow P. Brezany University of Vienna 29

Towards the Wisdom Grid (WG) P. Brezany University of Vienna 30

Towards the Wisdom Grid (WG) P. Brezany University of Vienna 30

WG Architecture Domain Knowledge Agents Knowledge Explorer Agent Wisdom Grid Agent Platform External Knowledge

WG Architecture Domain Knowledge Agents Knowledge Explorer Agent Wisdom Grid Agent Platform External Knowledge Base External Services Agent Grid Service Knowledge Base Service Knowledge Discovery Service Grid End User (personal) Agent P. Brezany KB University of Vienna 31

Work-Flow External Agents End User Agent Knowledge Base service Knowledge Agent Service Knowledge discovery

Work-Flow External Agents End User Agent Knowledge Base service Knowledge Agent Service Knowledge discovery service Services. . . Knowledge Base P. Brezany Knowledge Explorer Agent University of Vienna 32

Knowledge Discovery Service Client for other services Knowledge Discovery in Databases Grid. Miner data

Knowledge Discovery Service Client for other services Knowledge Discovery in Databases Grid. Miner data mining on-line analytical processing (OLAP) Web Mining semantic web Online libraries Web/Grid Services Knowledge Explorer Agent P. Brezany University of Vienna 33

Knowledge Base Service / KB KBS - Search, Query, Expand Knowledge Base KB- Database

Knowledge Base Service / KB KBS - Search, Query, Expand Knowledge Base KB- Database that stores particular data about real objects and relations between these objects and their properties Consists of ontologies and instances Information about resources (location, query lang. ) on the Web web/grid services , agents references to the online database Languages XML/RDF/DAML-OIL/DAML-S/OWL P. Brezany University of Vienna 34

Ontology - example DAML-OIL Language: Patient is Human has Age P. Brezany <daml: Class

Ontology - example DAML-OIL Language: Patient is Human has Age P. Brezany <daml: Class rdf: ID=“Human”> <rdfs: sub. Class. Of> <daml: Restriction cardinality=“ 1”> <daml: on. Property rdf: resource= “#Age”/> </daml: Restriction> </rdfs: sub. Class. Of> </daml> <daml: Datatype. Property about: ID=“Age”> <rdf: domain rdf: resource = “#Human”/> </daml: Datatype. Property> <daml: Class rdf: ID=“Patient”> <daml: sub. Class. Of rdf: resource=“#Human”/> </daml: Class> University of Vienna 35

Knowledge Base - example Human has Temperature Value is Patient has Attribute attribute: PAT_ID

Knowledge Base - example Human has Temperature Value is Patient has Attribute attribute: PAT_ID P. Brezany Tables table: PATIENTS University of Vienna has Database jdbc: //foo/hospital 36

Semantic mediator • Distributed heterogeneous databases – Different database schemas – Different query languages

Semantic mediator • Distributed heterogeneous databases – Different database schemas – Different query languages – Different names of attributes/tables… but the same semantics ! • WG enables semantics mediation at a higher level P. Brezany University of Vienna 37

Semantic mediator (cont. ) AGE Patient same. Property. As is Human PAT_AGE has Database

Semantic mediator (cont. ) AGE Patient same. Property. As is Human PAT_AGE has Database in Hospital X PAT_TAB Age has ID AGE BT . . . … … Database in Hospital Z Blood Type PATIENTS same. Property. As PAT_BLOOD_TYPE P. Brezany BT PAT_ID PAT_AGE PAT_BLOOD_TYPE . . . … … University of Vienna 38

Distributed Knowledge base uri: foo. Y#Human is subclass Class has property Class property Is

Distributed Knowledge base uri: foo. Y#Human is subclass Class has property Class property Is same class as uri: foo. Z#Temperature uri: foo. X#Patient class P. Brezany uri: foo. X#Ill_Person University of Vienna 39

Agent Grid Service Supports system with ability to communicate with the outside world in

Agent Grid Service Supports system with ability to communicate with the outside world in standard languages FIPA Standards ACL – Agent Communication Language KQML- Knowledge Query and Manipulation Language Agent Platform (JADE, FIPA-OS) Agents Domain Knowledge Agent Knowledge Explorer Agent End-user Agent (personal) P. Brezany University of Vienna 40

Querying End-user agent with own ontology – subset of ontology Merging of ontologies without

Querying End-user agent with own ontology – subset of ontology Merging of ontologies without own ontology Negotiating about domain of interest Queries created from ontology Templates <Patient rdf: ID=“ID 001”> <Temperature/> </Patient> P. Brezany University of Vienna 41

Answers • • P. Brezany Mined Knowledge (Grid. Miner) – Decision trees/ rules »

Answers • • P. Brezany Mined Knowledge (Grid. Miner) – Decision trees/ rules » (clinical pathways) – Association rules Instances of domain ontology – – Particular data References Links to Web sites Information about another knowledge providers University of Vienna 42

Case Study - Medical Application Semantic Web/Grid Knowledge Explorer Agent Knowledge Agent Q: Outcome?

Case Study - Medical Application Semantic Web/Grid Knowledge Explorer Agent Knowledge Agent Q: Outcome? + data about patient’s condition A: probability of survival + references to the diagnoses Knowledge Discovery Service Grid. Miner resources Training set Knowledge Base End User (personal) Agent P. Brezany Testset University of Vienna Hospital Databases 43

Conclusions and Future Work • Application and extension of the Grid technology to knowledge

Conclusions and Future Work • Application and extension of the Grid technology to knowledge discovery – an important, but nontraditional Grid application domain • Introduction of a new Grid Data Mediation Service • Future work – Performance evaluation on large synthetic data volumes – Coupling of the Data Minining services architecture with the OLAP services architecture – Development of a knowledge discovery oriented Grid Workflow Language and the appropriate Workflow Engine – Application of Grid. Miner to a real medical application (management of patients with severe traumatic brain injuries) – Development of the Wisdom Grid P. Brezany University of Vienna 44