Issues in Knowledge Representation and Semantic Interoperability Tom
Issues in Knowledge Representation and Semantic Interoperability Tom Beckman Principal, Beckman Associates tombeckman@starpower. net 202 -247 -6088 Ronald Reck Principal, RRecktek rreck@iama. rrecktek. com 703 -378 -8723 © 2006 Tom Beckman and Ronald P. Reck 1
Why is Knowledge Representation Important to Semantic Systems Provide systematic method to capture, define, and describe relevant semantics Provide explicit definition and description of Concepts, Attributes, Values, and Relationships Explicitly represent knowledge, experience, and expertise Determine the best Knowledge Representations to use depending on the task, environment, and user Employ appropriate knowledge structures and inference engines Select the best elicitation techniques to capture knowledge Compare and integrate similar concepts and data elements Determine the best level of granularity to describe data values Ensure proper consolidation of data elements across legacy systems Develop improved definition and description of concepts Provide improved data sharing, semantic interoperability, and search capability © 2006 Tom Beckman and Ronald P. Reck 2
Knowledge Representation Basics Knowledge Representation Characteristics: Ø Ø Models knowledge and reasoning about knowledge Describes characteristics and dimensions of knowledge Formally defines structures and processes for electronic and human reasoning Exposes Knowledge Structures and hides Inference Engines Symbolic Knowledge Representation Components: Ø Ø Ø Knowledge Structures: Reasoning Mechanisms: Content and Context: Objects Process Symbols Declarative Procedural Semantics Static Dynamic Meaning Explicit Representation of: Ø Ø Ø Ø Knowledge Experience Expertise Semantics of Content: Symbols, numbers, and language Context: Concept relationships and features Importance Uncertainty © 2006 Tom Beckman and Ronald P. Reck 3
Knowledge Representation Dimensions l Concept: Ø Ø Ø l Structure: Ø Ø Ø l Symbolic Format: <Concept Attribute Value> Concept Types: Object, Entity, and Abstraction Domain content knowledge Declarative representation Composed of Nodes and Links Expert System types parallel human cognitive schema Process: Ø Ø Ø Procedural representation Reasoning and Inference Modeling and Simulation © 2006 Tom Beckman and Ronald P. Reck 4
Concept Definition l l l Symbolic Format: <Concept Attribute Value> Typing: Object, Entity, Abstraction, and Event Features: Ø Ø Ø l Essential attributes that define the concept Attributes that discriminate boundaries between concepts Attribute Values also help define concept details Relations: Ø Ø Ø Typing: causal, taxonomy/hierarchy, associative Between concepts Between attributes © 2006 Tom Beckman and Ronald P. Reck 5
Concept Dimensions l l l Meaning: is nothing more than the sum of these concept dimensions (after Bruner) Definition Attributes: Ø Ø l Relations: Ø Ø l Ø Part of speech Grammar rules Context: Ø Ø l Between concepts Between attributes Linguistics: Ø l Stereotypical description of characteristics Format: <Concept Attribute Value> Based on user experience and purpose Common understanding Mental Models and Cognitive Schema © 2006 Tom Beckman and Ronald P. Reck 6
Attribute Value Typology Numeric: Ø Ø Ø Ordinal: Likert Scale Interval: Range, Continuous Variable Continuous: Ratio, normalized continuous variable Semantic: Ø Text Value Types: Ø Ø Unstructured: Instant Messaging Semi-Structured: Email, Memo Structured: Document, Hypertext Symbolic Value Types: l l l Binary: Boolean Categorical: Unrelated Nominal Ordinal: Related Nominal Sensory: Ø Ø Image: Digital spatial array, picture, video Signal: Time series, audio, sensor © 2006 Tom Beckman and Ronald P. Reck 7
Attribute Value Typing l Numeric: Ø Ø Ø l Symbolic: Ø Ø l Less precise but most concise Ease of reasoning and explanation Linguistic: Ø l Most precise reasoning Needs explanation of computing and results Often hidden – working in the background behind symbols and text Best for explanation and understanding Not as precise, and not concise Hard to directly reason with – language parsers Image: Described and classified using taxonomies and metadata Signal: Time series are described, interpreted, and classified using taxonomies, metadata, and analysis engines © 2006 Tom Beckman and Ronald P. Reck 8
Conceptual Primitives Knowledge Templates are key conceptual primitives: Ø Ø Represent assertions – the basic building blocks of structures Define, describe, and detail symbol features, values, & relations Can also represent Uncertainty & Importance Come in several standard templates: l l l Basic: <object attribute value> Faceted: <object attribute facet method> Measured: <object attribute facet value importance uncertainty> Declarative Conceptual Primitives: Ø Ø Feature Descriptor: <Object Attribute Value> Ex: <Car Color Red> Relation: <Symbol Relation Symbol> Ex: <Ford is-a-kind-of Car> Procedural Conceptual Primitives: Ø Ø Action: <Symbol Action Object> Ex: <User Places Order> Inference: <Predicate Method Assertion> Ex: <Formula Computes Price> These knowledge elements have certain properties: Ø Ø Ø Naming (Object) Describing (Attribute and Value) Organizing (Hierarchy) Relating (Functional, Causal, & Empirical Links) Constraining and Negating (Networks and Rules) © 2006 Tom Beckman and Ronald P. Reck 9
Determining Concept Similarity (1) l l l Compare potentially identical concepts across disparate legacy databases, documents, and Web sites Real understanding for potential data sharing comes from detailed examination and matching of concept characteristics Compare Symbol Names: Ø Ø l l Same Concept Type: Object, Entity, Abstraction, or Event Compare Concept Descriptions: Ø Ø l Natural language descriptions found in data dictionaries Compare keywords and context in concept descriptions Compare Concept Relations: Ø Ø l Identical or synonym Degree of similarity as relative specificity Relation Typing: causal, taxonomy/hierarchy, associative Relations between concepts and between attributes Relative position of concept in taxonomy Metadata Compare Concept Task Usage and Context © 2006 Tom Beckman and Ronald P. Reck 10
Determining Concept Similarity (2) l l At the data schema or logical level, concepts are described and defined as attributes/features and their allowable values Compare Concept Attributes (Data Elements): Ø Ø Ø l Compare attribute names and determine similarity Compare attribute descriptions in data schema Determine relative importance of attributes Number of essential common attributes between concept schema Number of dissimilar attributes between schema Look for compound data elements Compare Data Element Values: Value descriptions found in data schema and data dictionaries Ø Compare keywords and context in attribute value descriptions Ø Compare Value Typing: numeric, symbolic, natural language, image/signal Ø Compare Value Characteristics: field length, valid numeric range, boundary values, metrics Ø Compare importance and uncertainty measures l Compare Business Rules: Validity and consistency checks Ø © 2006 Tom Beckman and Ronald P. Reck 11
Examples of Data Element Matching l Ex 1: Comparing Address Zip Codes Ø Ø Ø Concept = Address Attributes = <Location Name> <Street Address> <City> <State> <Zip Code> Attribute Value Options: <Zip Code> = o o l Field Length: 5 digits, 9 digits (full Zip Code), 6 digits (Canadian) Typing: US: numeric; Canadian/English: alpha and numeric Valid Zip Code Values Business Rule: If IRS Philadelphia SC, then Zip Code = 19255 Ex 2: Determining Suspect Height Ø Qualitative Approach: Height = <Categorical-Symbol> o o Ø <tall medium short> Taller or shorter than witness or interviewer Quantitative Approach: Height = <Number> o o In inches; feet and inches; range estimate; Valid range (for adult – context) = 30 to 90 inches © 2006 Tom Beckman and Ronald P. Reck 12
Capturing Accurate Data Values l MIT Expert System story Ø Ø l DOJ Teenage Drug Usage Survey Ø Ø Ø l Monthly survey on frequency of teenage drug usage During the past month, how often did you smoke weed? Values = <never 1 -2 x weekly daily> Should have just captured an estimate of the number of times; later the data could be grouped as needed As captured, unclear if data trends mean anything unless huge shift Data Accuracy Principle: Ø Ø l Investment Expert System – Randy Davis suggestion Qualitative values for economic attributes rule explosion Capture numeric values when possible and relatively easy Even qualitative data can be put on Likert value scale (1 -7) Harmonizing values from varied data schema Ø Ø When possible, estimate and convert qualitative values to numeric Perhaps place new converted values in enterprise data warehouse © 2006 Tom Beckman and Ronald P. Reck 13
Issues in Semantic Systems l No consolidation/harmonization of existing legacy databases: Ø Ø Ø l l l l Terms are semantically under-defined and under-described: must find and validate possible synonyms across disparate legacy DBs, metadata, and text/content. Terms as related and organized into concept hierarchy, but more general or specific. Multiple attributes and their values define and describe the concept and context. Consolidate terms/concepts from disparate DBs: determine synonym, similarity, and context. Consolidate attributes from disparate DBs: is the same feature being described? Consolidate value typing and problematic conditions from disparate DBs Need improved description of concepts Need improved attribute typing and values Determine needed level of concept precision/granularity Creation of standard vocabularies Explicit representation of relationships, importance, and uncertainty Weak accuracy of matching and access function How to assert new facts Handling of Synonyms, Antonyms, Typing, and Metadata © 2006 Tom Beckman and Ronald P. Reck 14
Semantic Web Components l l Domain Content: Knowledge, experience, and expertise Domain Taxonomy and Ontology: Ø Ø l l Organization and Structure: Web sites and document collections Classification Methods: Ø Ø l l l RDF/OWL Object Methods: Inheritance and Classification Similarity-based Rule-based Network-based Object-based Indexing: Item typing and meta-tagging Linguistics: Natural Language Processing & Text Generation Search Query: Keyword Bayesian Search & Semantic Search Entity Extraction: People, places, and events Analysis Methods: Data Mining, Text Mining, Link Analysis, Machine Learning, & Knowledge Discovery Intelligent Agents: Simulation and Modeling © 2006 Tom Beckman and Ronald P. Reck 15
Issues in Knowledge Representation and Semantic Interoperability Questions? ? Tom Beckman Principal, Beckman Associates tombeckman@starpower. net 202 -247 -6088 Ronald P. Reck Principal, RRecktek rreck@iama. rrecktek. com 703 -378 -8723 © 2006 Tom Beckman and Ronald P. Reck 16
- Slides: 16