The DHS Ontology Case Presentation by Ontology Stream

  • Slides: 41
Download presentation
The DHS Ontology Case Presentation by Ontology. Stream Inc Paul Stephen Prueitt, Ph. D

The DHS Ontology Case Presentation by Ontology. Stream Inc Paul Stephen Prueitt, Ph. D 19 March 2005 Ontology Tutorial 1, copyright, Paul S Prueitt 2005

Ontology technology exists in the market However, practical problems block progress in designing and

Ontology technology exists in the market However, practical problems block progress in designing and implementing DHS wide ontology technology. Business process mythology, like Earned Value program management, is not focusing on the right questions and maturity models for software development have precluded most innovations that come from outside the relational database paradigm. These practical problems are also partially a consequent of 1. Some specific institutional behaviors related to traditional program management. 2. Confusion caused by long term, and heavily funded Artificial Intelligence marketing activities

As a general proposition, through out the federal government, quality metrics are not guiding

As a general proposition, through out the federal government, quality metrics are not guiding management decisions supporting: 1) Quick transitions from database centered information technology to XML based Semantic Web technology. 2) Transitions from XML repositories to ontology mediated Total Information Awareness, with Informational Transparency, in Secure Channels. ? DHS Ontology 1) World-wide Trade Data 2) Investigation Targeting 3) Risks, Threats and Vulnerabilities 4) Policy Enforcement Emerging Semantic Web Standards

Diagram from Prueitt, 2003

Diagram from Prueitt, 2003

First two steps are missing Seven step AIPM RDBMS Diagram from Prueitt, 2003 is

First two steps are missing Seven step AIPM RDBMS Diagram from Prueitt, 2003 is not complete

The measurement/instrumentation task First two steps in the AIPM Measurement is part of the

The measurement/instrumentation task First two steps in the AIPM Measurement is part of the “semantic extraction” task, and is accomplished with a known set of techniques” • Latent semantic technologies • Some sort of n-gram measurement with encoding into hash tables or internal ontology representation (CCM and Nd. Core, perhaps Aero. Text and Convera’s process ontology (? ), Orbs, Hilbert encoding, Core. Talk/Cubicon. • Stochastic and neural/ genetic architectures

One model for semantic extraction explicitly focuses on the first two aspects of the

One model for semantic extraction explicitly focuses on the first two aspects of the AIPM; e. g. instrumentation/measurement and data-encoding/interpretation

Actionable Intelligence Process Model has an action-perception event cycle. Stratified ontology supports the use

Actionable Intelligence Process Model has an action-perception event cycle. Stratified ontology supports the use of this cycle to produce knowledge of attack and anticipatory mechanisms based on the measurement of substructural categorical invariance. Work flow and process ontology is available as a basis for encoding knowledge of anticipatory response mechanisms. Categorical Invariance is measured, using Orbs (Ontology referential bases) for example or CCM (Contiguous Connection Model) encoding, and organized as a resource for RDF triples using some lite OWL OIL.

Distributed ontology management is already available in some military activities

Distributed ontology management is already available in some military activities

distributed Ontology Management Architecture d-OMA Ontology Architecture March 10, 2005 Version 3. 0 Part

distributed Ontology Management Architecture d-OMA Ontology Architecture March 10, 2005 Version 3. 0 Part of a series on the nature of machine encoding of sets of concepts

Benchmark Higher level Ontologies as part of the d-OMA Transition Community Localized Ontologies within

Benchmark Higher level Ontologies as part of the d-OMA Transition Community Localized Ontologies within an DAML type agency structure

Higher level Ontologies Links to Web services and Internal R&D Community Localized Ontologies

Higher level Ontologies Links to Web services and Internal R&D Community Localized Ontologies

Ontology users have the following roles: Ontology librarian Knowledge engineers Knowledge Engineers Information mediators

Ontology users have the following roles: Ontology librarian Knowledge engineers Knowledge Engineers Information mediators ( e. g. ontology librarians ) A reconciliation process is required between ontology services. Intelligence Targeting Analyst Community workers (e. g. analysts)

Higher level Ontologies Real time acquisition of new concepts, and modifications to existing concepts

Higher level Ontologies Real time acquisition of new concepts, and modifications to existing concepts are made via an piece of software called Ontology Use Agents have various type of functions expressed in accordance with roles Community Mature scholarship from evolutionary psychology research communities. Localized Ontologies

Human and system interaction with a common Ontology Structure Ontology Presentation Part of a

Human and system interaction with a common Ontology Structure Ontology Presentation Part of a series on the nature of machine encoding of sets of concepts

First principles First, and before all else, an computer based ontology is a {

First principles First, and before all else, an computer based ontology is a { concepts } In the natural, physical, sciences an ontology is the causes of those things that one can interact with in Nature. Physical science informs us that a formative process is involved in the expression of natural ontology in natural settings. The set of “our” personal and private concepts is often thought to be the causes of how we reason as humans. This metaphor is operational in many peoples’ understanding about the nature and usefulness of machine encoded ontology. But this metaphor can also mislead us!!!! Extensive literatures indicate that the Artificial Intelligence (AI) mythology has lead many to believe that the “reasoning” of an ontology might be the same as the reasoning of a human in all cases. • This inference is not proper because the truthful-ness of this inference has not been demonstrated by natural science, and perhaps cannot be demonstrated no matter what the levels of government funding for AI. • AI is discounted in Tim Berner’s Lee’s notion of the Semantic Web.

Tim Berner’s Lee Notion of Semantic Web The point being made here is that

Tim Berner’s Lee Notion of Semantic Web The point being made here is that the notion of “inference” is very different depending if one is talking about the human side or the machine side of the Semantic Web. One consequence of acknowledging this difference is to elevate the work of the authors of the OASIS standards, in particular Topic Maps. In Topic Maps we have an open world assumption and very little emphasis on computational inference. Human knowledge is represented in a “shallow form, and visualization is used to manage this representation. Computation with topic maps AND OWL ontologies work together with XML repositories.

First principles { concepts } Let is use only set theory to consider Tim

First principles { concepts } Let is use only set theory to consider Tim Berners Lee’s notion of Semantic Web. Let C = { concepts } and B be a subset of C Some software Subsetting function The subsetting function might be an “ask-tell” Human and/or machine computation creates a well formed query interaction between two ontologies B is a subset of C

First principles { concepts } At this point we have various possible consequences. 1)

First principles { concepts } At this point we have various possible consequences. 1) The small(er) B ontology might simply be viewed by a human and actions taken outside of the information system 2) The smaller ontology might be used in several different ways Subsetting function a) Imported into a reasoner to be considered in conjunction with various different data sources. b) Send messages to other ontologies via a distributed Ontology Management System B is a subset of C

First principles Situational Ontology Software The knowledge repository acts as a “perceptual ground” in

First principles Situational Ontology Software The knowledge repository acts as a “perceptual ground” in a “figure-ground” relationship. The ontology sub-setting function has pulled part, but not all, of the background into a situational focus. This first principle is consistent with perceptual physics and thus is “informed” by natural science.

(The following slides are from Ontology. Stream’s Ontology Presentation VII – General Background) Extending

(The following slides are from Ontology. Stream’s Ontology Presentation VII – General Background) Extending Ontology Structure over legacy information systems Part of a series on the nature of machine encoding of sets of concepts

Presentation Contents Functional specs Ontology Use Start-up Use Model: Steady State Ontology System Components:

Presentation Contents Functional specs Ontology Use Start-up Use Model: Steady State Ontology System Components: Framework for Query Entity Data Access: Steady State Ontology System Framework for Query Entity Building ontology from data flow Using Ontology Generating Framework The inverse problem: generating synthetic data Finding data regularity in its natural contextualization

Functional specs: 1. Human-centric: must be human (individual) centric in design and function 2.

Functional specs: 1. Human-centric: must be human (individual) centric in design and function 2. Support data retrieval: must act as a data retrieval mechanism 3. Event structure measurement: must assist in the definition of data acquisition requirements on an on-going basis 4. Interactive: must support multiple interacting ontologies 5. Real Time: must aid in real time problem solving and in the long term management of specific sets of concepts Note: Ontology mediated knowledge systems have operational properties that are quite different from traditional relational database information systems. These five functional specs have been reviewed by a small community of professional ontologists, as has been deemed correct for knowledge systems.

Ontology Use Start-up Use Model I Transaction process Entity updates Inferences about Knowledge Base

Ontology Use Start-up Use Model I Transaction process Entity updates Inferences about Knowledge Base Query entities One of the Ontology Reasoners Start-up Use Case – Step 1 startup At start-up, resources are loaded into the reasoner. 1) Some part of the knowledge base is expressed as reasoner complaint RDF 2) Some inference logic is loaded into the reasoner 3) Query entities are loaded to reflect interests of analyst(s) 4) Transaction processes are occurring in real time

Start-up Use Model II Transaction process Entity updates inferences Knowledge Base Query entities Reasoner

Start-up Use Model II Transaction process Entity updates inferences Knowledge Base Query entities Reasoner Start-up Use Case Step 2 startup Instance data Since instance data is much larger (2 or more magnitudes larger) than the knowledge base, the instance data is managed in a separate start-up process.

Model: Steady State Ontology System Transaction process Entity updates inferences Inference Mgr Knowledge Base

Model: Steady State Ontology System Transaction process Entity updates inferences Inference Mgr Knowledge Base Query entities Query Mgr Data Access Mgr Reasoner Ontology Mgr Instance data may be remote or local. Local data is on the same network as the knowledge base.

Components: Framework, User Visualization point of view Transaction processes updates inferences Inference Manager Query

Components: Framework, User Visualization point of view Transaction processes updates inferences Inference Manager Query Manager Data Query entities Reasoner Data Access Mgr Knowledge Base Ontology Mgr Use Case: Steady State Ontology System

Data Access: Steady State Ontology System using the OWL standard ** The RDF knowledge

Data Access: Steady State Ontology System using the OWL standard ** The RDF knowledge base ** is a set of concepts expressed as a set: Instance data Pipes and Threads Data Object { < subject, verb, predicate > } Data and the data is either “XML” or a data structure such as one would have as a C construct. The Data Access Manager must manage the mapping between local data stores (sometimes having millions of elements) and the set of concepts. The remote data may have many persistence forms, and will be accessed via a data object. Data Access Mgr Knowledge Base ** We use RDF and OWL as a standard to create minimal and well knowledge inference capabilities.

Framework from Query Entity point of view Transaction processes updates inferences Inference Manager Query

Framework from Query Entity point of view Transaction processes updates inferences Inference Manager Query Manager Data Query entities Reasoner Data Access Mgr Knowledge Base Ontology Mgr A Query Entity is itself a type of light ontology. It develops “knowledge” about the user(s) and about the query process.

Framework from Knowledge Management point of view Transaction processes Query Manager Query entities Real

Framework from Knowledge Management point of view Transaction processes Query Manager Query entities Real time analysis is supported through the development and use of query entities. These entities have “regular” structure and are managed within a Framework.

Building ontology from data flow Transaction processes A model of the "causes" of transaction

Building ontology from data flow Transaction processes A model of the "causes" of transaction data. The model is based on, grounded in, the concept of "occurrence in the real world", or "event". Associated with each event, we may have a "measurement". So we have a set of events { e(i) } Objective: We convert a stream of event measurements into an “transaction” ontology, and create auxiliary processes that will use a general risk ontology, an ontology about process optimization, and other “utility ontologies”. where i is a counter. Some of the fields MAY not be used. Later the number of fields in any "findings data flow" may increase or decrease without us caring at all.

Using Ontology Transaction processes Knowledge Base Query Manager Data Query entities Consider a set

Using Ontology Transaction processes Knowledge Base Query Manager Data Query entities Consider a set of events { e(i) } where i is a counter. Each event will have a weakly w structured (free form text) and structured s component. So we use the notation e = w/s or e(i) = w(i)/s(i). Instance data

Ontology Generating Framework Semantic extraction Transaction processes { w(i) } { e(i) } {

Ontology Generating Framework Semantic extraction Transaction processes { w(i) } { e(i) } { s(i) } Notation e(i) = w(i)/s(i) An event is measured by filling in slots in a data entry form, and by typing in natural language into comments fields in these entry forms. Observation: Given real data, one can categorize the set of events due to the nature of the information filled in. Discrete analysis Semantic extraction is performed using one of several tools, or tools in combination with each other Discrete analysis is mostly the manual development of ontology through the study of natural categories in how the data is understood by humans.

Finding data regularity in its natural contextualization For each event we may have zero

Finding data regularity in its natural contextualization For each event we may have zero or more free text fields. Suppose we concatenate these, into one text unit, and perhaps also develop some metadata (in some way) that will help contextualize the semantic extraction process. We label this unit as “ w(i) ”. Free form text is weakly structured. The set { w(i) } Semantic extraction { w(i) } Regularity in data flow is “caused” by the events occurring in the external world. Thus the instances of specific data in data records provide to the knowledge system a “measurement” of the events in the external world. is a text corpus that we would like to “associate” with several ontologies. Each association is made as exploratory activities with specific goals. Each s(i) is a record from a single table. Suppose there are 120 columns. Each column has values, sometimes empty. Fix the counter at *. Let s(*, j) , j = 1, . . . , 120 be the columns. We can call these columns also using the term “slot”. { s(i) } Discrete analysis Now for each s(*, j) list the values that are observed to be in that column. These values are the possible “fillers” for the associated slot.

Data regularity in context …. Patterns and invariance • XML and related standards •

Data regularity in context …. Patterns and invariance • XML and related standards • community open and protected standards (Core. Talk, Rosetta Net, eb. XML) • . NET component management • J 2 EE frameworks – Spring • General framework constructions • autonomous extraction (Hilbert encoding of data into keyless hash tables, SLIP Shallow Link analysis Iterated Scatter-Gather an Parcelation) Core, Aero. Text (etc) semantic extraction using process knowledge and text input

The role of community 1) A community of practice provide a reification process that

The role of community 1) A community of practice provide a reification process that is Human centric (geographical-community / functional-community) 2) Each community may have locally instantiated OWL ontology with topic map visualization. a) Consistently and completeness is managed locally as a property of the human individuals, acting within a community, and a locally persisted history of informational transactions with his/her agent b) Individual agents can query for and acquire XML based Web Services, negotiate with other agents and create reconciliation processes involving designated agencies. 3) Knowledge engineers act on the behalf of policy makers to • • • reify new concepts, delete concepts and to instantiate reconciliation containers

Establishing coherence in “natural” knowledge representation 1) Coherence is a physical phenomenon seen in

Establishing coherence in “natural” knowledge representation 1) Coherence is a physical phenomenon seen in lasers a) b) c) Brain function depends critically on electromagnetic coherence Incoherence, e. g. non-rationality, and in-completeness are two separate contrasting issues to the issue of coherence Mathematics, and therefore computer science and logic, have completeness and consistency issues that are well established and well studied 2) Logical coherence is sometimes treated as consistency in logic a) b) One may think that one has logical consistency and yet this property of the full system was lost at the last transaction within the Ontology The act of attempting to find a complete representation of information organization is sometimes called “reification”, and reification efforts works against efforts to retain consistency 3) Human usability often is a function of a proper balance between logic and agility

Understanding multiple viewpoints 1) Logical consistency and single minded-ness are operational linked together in

Understanding multiple viewpoints 1) Logical consistency and single minded-ness are operational linked together in most current generation decision support systems. Database schema legacy issues. Schema servers, FEA (Federal Enterprise Architecture) standards, schema independent data systems 2) Observation: Human cognitive capabilities have far more agility than current generation decision support systems 3) The topic map standard (2001, Newcomb and Briezski ) was specifically developed to address the non-agility of Semantic Web standards based on OWL and RDF. (Ontopia, Steve Pepper) 4) Combining XML repositories, OWL, distributed agent architectures and Topic Maps is expressed as Stratified Ontology Management

Detection of novelty Scenario: an targeting and search analyst at the Port of Seattle

Detection of novelty Scenario: an targeting and search analyst at the Port of Seattle is only partially aware of why she feels uncomfortable about some characteristic of a shipment from Sweden. The feeling is expressed in a hand written finding and fed into a document management repository for findings. A targeting and search analyst at the Port of LA expresses a fact about a similar shipment without knowing of her colleague’s sense of discomfort. 1) 2) 3) 4) Conceptual roll-up techniques are used on a minute by minute basis to create a viewable topic map over occurrences of concepts expressed in findings. Link analysis connects an alert about uncertainty in the Seattle finding to the fact from LA to produce new information related to a known vulnerability and attack pattern. New knowledge forms are propagated into OWL instantiated ontology and rules and viewed using Topic Maps.

Agent architecture Scenario: Human analysts provide situational awareness via tacit knowledge, personally agent mediated

Agent architecture Scenario: Human analysts provide situational awareness via tacit knowledge, personally agent mediated interactions with agent structures, and human to human communications. A model of threats and vulnerabilities has evolved but does not account for a specific new strategy being developed by a smart smuggler. The smuggler games the current practices in order to bring illegal elements into the United States 1) The model of threats and vulnerabilities expresses as a reification process from various techniques and encoded OWL/Protégé ontology with rules 2) Global Ontology: The model is maintained via near real time agency processes under the observation, active review, of knowledge engineers and program managers working with knowledge of policy and event structure 3) Local Ontology: Information is propagated to individual analysts via alerts and ontology management services controlled by the localized agent (of the person)

New (1/30/2005) tutorial on automated extraction of ontology from free form text: http: //www.

New (1/30/2005) tutorial on automated extraction of ontology from free form text: http: //www. bcngroup. org/beadgames/anticipatory. Web/23. htm