The Suggested Upper Merged Ontology SUMO at Age

  • Slides: 49
Download presentation
The Suggested Upper Merged Ontology (SUMO) at Age 7: Progress and Promise Presented at

The Suggested Upper Merged Ontology (SUMO) at Age 7: Progress and Promise Presented at Ontolog 6 September 2007 Adam Pease Articulate Software apease@articulatesoftware. com http: //www. ontologyportal. org/ http: //home. earthlink. net/~adampease/professional/ © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 1

Overview • SUMO is a large, open source, formal ontology stated in first-order logic

Overview • SUMO is a large, open source, formal ontology stated in first-order logic • Mapped to a large multi-lingual lexicon • With open source tools for ontology development and application © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 2

What's New • More content about social relationships, justice and law, military events-peopleprocesses •

What's New • More content about social relationships, justice and law, military events-peopleprocesses • Wikipedia (DBpedia) links • Updated mappings to Word. Net 3. 0 • New tests of inference and many new inference engines • SQL and XML generation tools • Many new academic and commercial uses © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 3

SUMO Prize - 2007 • US$3000. 00 • Due December 1, 2007 • Entries

SUMO Prize - 2007 • US$3000. 00 • Due December 1, 2007 • Entries must be open source SUO-KIF files that extend SUMO • Judged on several criteria: – Degree of formalization – Scope and coverage – Coherent new topic or domain – Actual utility in an application © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 4

Pursuit of Rigor in Data Standards Old-style (most common) standards specifications: (ISO 14258, Requirements

Pursuit of Rigor in Data Standards Old-style (most common) standards specifications: (ISO 14258, Requirements for enterprise-reference architectures and methodologies) “ 3. 6. 1. 1 Time representation If an individual element of the enterprise system has to be traced then properties of time need to be modeled to describe short-term changes. If the property time is introduced in terms of duration, it provides the base to do further analyses (e. g. , process time). There are two kinds of behavior description relative to time: static and dynamic. ” Data-model standards (ISO 10303 -41, Product Description and Support) ENTITY product_context SUBTYPE OF (application_context_element); discipline_type : label; END_ENTITY; Semantic-model standards (IEEE P 1600. 1 - SUMO, ISO 18629 -11, PSL Core) (forall (? t 1 ? t 2 ? t 3) (=> (and (before ? t 1 ? t 2) (before ? t 2 ? t 3)) (before ? t 1 ? t 3))) Thanks to Steve Ray, NIST © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 5

Terms and Concepts Ontology work should be here, since logic is needed to substitute

Terms and Concepts Ontology work should be here, since logic is needed to substitute for human thought. Concept Refers To Referent Symbolizes Stands For “Orange” Lots of “ontology” work Term has really been here. Slide adpated from (c) Key-Sun Choi for Pan Localization 2005 from the slide of [Bargmeyer, Bruce, Open Metadata Forum, Berlin, 2005] C. K. Ogden/I. A. Richards, The Meaning of Meaning A Study in the Influence of Language upon Thought and The Science of Symbolism London 1923, 10 th edition 1969 © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 6

Imagine. . . your view of the web name Joe Smith BS Case Western

Imagine. . . your view of the web name Joe Smith BS Case Western Reserve, education 1982 MS UC Davis, 1984 private Married, 2 children CV 1985 -1990 ACME Software, work programmer © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 7

. . . and the Computer's View name education CV private work Thanks to

. . . and the Computer's View name education CV private work Thanks to Frank van Harmelen for the original idea of this slide and Peter Yim for the Chinese language content © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 8

But wait, we've got XML - <job name=”Joe Smith” title=”Programmer”> © 2007 Adam Pease,

But wait, we've got XML - <job name=”Joe Smith” title=”Programmer”> © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 9

But wait, we've got XML - <job name=”Joe Smith” title=”Programmer”> <x 83 m 92=”|||||”

But wait, we've got XML - <job name=”Joe Smith” title=”Programmer”> <x 83 m 92=”|||||” title=”. . . ”> © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 10

But wait, we've got Taxonomies Mammal Person Joe. Smith © 2007 Adam Pease, Articulate

But wait, we've got Taxonomies Mammal Person Joe. Smith © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 11

But wait, we've got Taxonomies x 931 o 4839 i 3729 © 2007 Adam

But wait, we've got Taxonomies x 931 o 4839 i 3729 © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 12

Wait, we've got semantics Mammal subclass Person instance Joe. Smith implies instance Joe. Smith

Wait, we've got semantics Mammal subclass Person instance Joe. Smith implies instance Joe. Smith © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 13

Wait, we've got semantics Mammal subclass Person instance implies instance Joe. Smith x 9834

Wait, we've got semantics Mammal subclass Person instance implies instance Joe. Smith x 9834 r 22 u 8475 r 53 p 3489 implies r 53 p 3489 © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 14

Semantics Helps a Machine Appear Smart • A “smart” machine should be able to

Semantics Helps a Machine Appear Smart • A “smart” machine should be able to make the same inferences we do • (let's not debate the AI philosophy about whether it would actually be smart) © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 15

Definitions • An ontology is a shared conceptualization of a domain • An ontology

Definitions • An ontology is a shared conceptualization of a domain • An ontology is a set of definitions in a formal language for terms describing the world © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 16

Frames • Object- or term-centered • Frames, slots, values, (and attributes) Adam: Person height

Frames • Object- or term-centered • Frames, slots, values, (and attributes) Adam: Person height 5'8” occupation consultant cardinality: 1 © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 17

Frame Restrictions 1. b is between a and c 1. (between 1 a betweenness

Frame Restrictions 1. b is between a and c 1. (between 1 a betweenness 1) 2. (between 2 b betweenness 1) 3. (between 3 c betweenness 1) 4. vs 5. (between a b c) 2. Adam is not an accountant 1. (not. Occupation Adam Accountant) 2. vs 3. (not (occupation Adam Accountant)) 3. Existential vs. Universal quantification 4. Similar problems for many description logics 5. Very efficient computation however © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 18

Digression: Implementation is Different from Representation • Why lose meaning at design time just

Digression: Implementation is Different from Representation • Why lose meaning at design time just because of runtime issues? – We can’t reason with English definitions, but that doesn’t mean we shouldn’t document our terms • Many different implementations may be done from the same representation • This does not mean that run time issues should be ignored at design time – If you represent information you know can’t be reasoned with, it better not be essential in most conceivable applications © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 19

Many Ways to Use Ontology • As an information engineering tool – Create a

Many Ways to Use Ontology • As an information engineering tool – Create a database schema – Map the schema to an upper ontology – Use the ontology as a set of reminders for additional information that should be included • As more formal comments – Define an ontology that is used to create a DB or OO system – Use a theorem prover at design time to check for inconsistencies • For taxonomic reasoning – Do limited run-time inference in Prolog, a description logic, or even Java • For first order logical inference – Full-blown use of all the axioms at run time © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 20

Upper Ontology • An attempt to capture the most general and reusable terms and

Upper Ontology • An attempt to capture the most general and reusable terms and definitions • Provokes thought on clarifying the meaning of more specific terms • Provides for large-scale reuse © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 21

Ontology vs Language and Knowledge Ontology Language - Expandable - language independent - machine

Ontology vs Language and Knowledge Ontology Language - Expandable - language independent - machine understandable - understood by humans - ambiguous Knowledge - changes rapidly - may be local to an entity © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 22

Suggested Upper Merged Ontology • 1000 terms, 4000 axioms, 750 rules • Mapped by

Suggested Upper Merged Ontology • 1000 terms, 4000 axioms, 750 rules • Mapped by hand to all of Word. Net 1. 6 • then ported to 3. 0 • Development begun in 2000 – US Government small business grant • Associated domain ontologies totalling 20, 000 terms and 70, 000 axioms • Free 1. SUMO is owned by IEEE but basically public domain 2. Domain ontologies are released under GNU 3. www. ontologyportal. org © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 23

SUMO (continued) • Formally defined, not dependent on a particular implementation • Open source

SUMO (continued) • Formally defined, not dependent on a particular implementation • Open source toolset for browsing and inference – http: //sigmakee. sourceforge. net • Many uses of SUMO (independent of the SUMO authors and funders) – http: //www. ontologyportal. org/Pubs. html © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 24

SUMO Validation • Mapping to all of Word. Net lexicon – A check on

SUMO Validation • Mapping to all of Word. Net lexicon – A check on coverage and completeness (at a given level of generality) • Peer review – Open source since its inception • Formal validation with a theorem prover – Free of contradictions (within a generous time bound for search) • Application to dozens of domain ontologies © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 25

Word. Net • Lexical database • 100, 000 word senses – synsets • Created

Word. Net • Lexical database • 100, 000 word senses – synsets • Created by George Miller's group at Princeton • Free • De facto standard in the linguistics world © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 26

Word. Net to SUMO Mapping • Word. Net synset “plant, flora, plant_life” is equivalent

Word. Net to SUMO Mapping • Word. Net synset “plant, flora, plant_life” is equivalent to the formal SUMO term 'Plant' 1. 00008864 03 n 03 plant 0 flora 0 plant_life 0 027@. . . | a living organism lacking the power of locomotion &%Plant= 2. SUMO has axioms that explain formally what a plant is (=> (and (instance ? SUBSTANCE Plant. Substance) (instance ? PLANT Organism) (part ? SUBSTANCE ? PLANT)) (instance ? PLANT Plant)) © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 27

Word. Net to SUMO Mapping • Most nouns map to classes • Most verbs

Word. Net to SUMO Mapping • Most nouns map to classes • Most verbs map to subclasses of &%Process • Most adjectives map to a &%Subjective. Assessment. Attribute • Most adverbs map to relations of &%manner © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 28

Internationalization • Translation of SUMO paraphrases to diverse multiple languages – Some confidence there’s

Internationalization • Translation of SUMO paraphrases to diverse multiple languages – Some confidence there’s no cultural or linguistic bias – Chinese, Hindi, Tagalog, Czech, German, Italian, Korean, Romanian, Arabic – Estonian and Hungarian in development • SUMO is linked to multiple very large lexicons (Euro Word. Net, Balkanet, How. Net etc) – English, Chinese, Italian, Arabic © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 29

SUMO Structure Structural Ontology Base Ontology Set/Class Theory Numeric Graph Temporal Measure Mereotopology Processes

SUMO Structure Structural Ontology Base Ontology Set/Class Theory Numeric Graph Temporal Measure Mereotopology Processes Objects Qualities © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 30

SUMO+Domain Ontology SUMO Structural Ontology Base Ontology Total Terms Total Axioms Rules Set/Class Theory

SUMO+Domain Ontology SUMO Structural Ontology Base Ontology Total Terms Total Axioms Rules Set/Class Theory Numeric Temporal Graph Measure Processes Mereotopology 20399 67108 2500 Objects Qualities Geography WMD Transnational Issues Communications Government Transportation Elements NAICS Mid-Level Financial Ontology Distributed Computing People Terrorist Economy United. States Afghanistan France Terrorist Attacks ECommerce Services Military Terrorist Attack Types Biological Viruses World Airports © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 31 …

Are SUMO Terms Directly Usable? • Yes. • Study – 1/3 of upper ontology

Are SUMO Terms Directly Usable? • Yes. • Study – 1/3 of upper ontology terms directly appear in answers on large test – Cohen, P. , Chaudhri, V. , Pease A. , and Schrag, R. (1999), Does Prior Knowledge Facilitate the Development of Knowledge Based Systems, In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-1999). Menlo Park, Calif. : AAAI Press. http: //home. earthlink. net/~adampease/professional/ cohen-aaai 99. ps • before (in time), agent (of a process), etc. © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 32

High Level Distinctions The first fundamental distinction is that between ‘Physical’ (things which have

High Level Distinctions The first fundamental distinction is that between ‘Physical’ (things which have a position in space/time) and ‘Abstract’ (things which don’t) Entity Physical Abstract © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 33

High Level Distinctions Partition of ‘Physical’ into ‘Objects’ and ‘Processes’ Physical Object Process ©

High Level Distinctions Partition of ‘Physical’ into ‘Objects’ and ‘Processes’ Physical Object Process © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 34

Objects Object Self. Connected. Object Substance Corpuscular. Object Region Collection © 2007 Adam Pease,

Objects Object Self. Connected. Object Substance Corpuscular. Object Region Collection © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 35

Processes Dual. Object. Process Substituting Transaction Comparing Attaching Detaching Combining Separating Internal. Change Biological.

Processes Dual. Object. Process Substituting Transaction Comparing Attaching Detaching Combining Separating Internal. Change Biological. Process Quantity. Change Damaging Chemical. Process Surface. Change Creation State. Change Shape. Change Intentional. Process Intentional. Psychological. Process Recreation. Or. Exercise Organizational. Process Guiding Keeping Maintaining Repairing Poking Content. Development Making Searching Social. Interaction Maneuver Motion Body. Motion Direction. Change Transfer Transportation Radiating © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 36

Abstract Set. Or. Class Relation Proposition Quantity Number Physical. Quantity Attribute Graph. Element ©

Abstract Set. Or. Class Relation Proposition Quantity Number Physical. Quantity Attribute Graph. Element © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 37

Case Roles • Roles that entities play in a Process – agent, patient, instrument

Case Roles • Roles that entities play in a Process – agent, patient, instrument etc. © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 38

Case Roles • “Brutus stabbed Caesar with a knife on Tuesday. ” Caesar patient

Case Roles • “Brutus stabbed Caesar with a knife on Tuesday. ” Caesar patient agent Brutus instrument A Stabbing A Knife time A Tuesday © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 39

Case Roles • “Brutus stabbed Caesar with a knife on Tuesday. ” (exists (?

Case Roles • “Brutus stabbed Caesar with a knife on Tuesday. ” (exists (? S ? K ? T) (and (instance ? S Stabbing) (instance ? K Knife) (instance ? T Tuesday) (agent ? S Brutus) (patient ? S Caesar) (time ? S ? T) (instrument ? S ? K))) © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 40

Attributes and Roles • (attribute John. Doe Unemployed) • (attribute GIJane Soldier) • (attribute

Attributes and Roles • (attribute John. Doe Unemployed) • (attribute GIJane Soldier) • (attribute My. Car Blue) © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 41

Example Rules (=> (instance ? DRIVE Driving) (exists (? VEHICLE) (and (instance ? VEHICLE

Example Rules (=> (instance ? DRIVE Driving) (exists (? VEHICLE) (and (instance ? VEHICLE Vehicle) (patient ? DRIVE ? VEHICLE)))) “If there's an instance of Driving, there's a Vehicle that participates in that action. ” Not just an English definition for humans to read, but a logical definition that can be used in proofs. © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 42

Commercial Application • One year project for Articulate Software • Working with a company

Commercial Application • One year project for Articulate Software • Working with a company that creates financial transaction systems for royalty payments • Re-engineer current ontology management business process, tools and ontology © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 43

Commercial Application • Extensive current ontology • Captured in spreadsheets • Local term names

Commercial Application • Extensive current ontology • Captured in spreadsheets • Local term names and definitions for every customer – An essential part of their process • Ontology management system that exports XML & RDF • One end-user database is nearly 3 GB – Ontology functions can be batch-process © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 44

Project Goals • To add formality to existing model – To support full logical

Project Goals • To add formality to existing model – To support full logical inference, consistency checks • Give customers user-friendly ontology editor – so that they can maintain the ontology • Create broader set of definitions – Enable greater DB integration – Enable expansion into new markets • Leverage work • Exercise SUMO and Sigma in business © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 45

Initial Tasks • Implement UI improvements to Sigma – Simplified tree-based editor – Simplified

Initial Tasks • Implement UI improvements to Sigma – Simplified tree-based editor – Simplified frame-style browser • XML/SQL ontology export – Uses meta-predicates for physical DB structure • Merge existing ontology with SUMO © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 46

DBPedia • “People” content uses FOAF – Lightweight, redundant, ad-hoc – Only a tiny

DBPedia • “People” content uses FOAF – Lightweight, redundant, ad-hoc – Only a tiny portion is used • birthdate, deathdate, birthplace, deathplace, names, firstname, lastname – http: //xmlns. com/foaf/spec/ – 16 MB KIF content http: //www. ontologyportal. org/content/DBPedia. People. zip • Recent announcement of DBPedia now mapped to Word. Net – Which gets us links to SUMO © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 47

TPTP • Research effort in automated theorem proving • 40+ different first order logic

TPTP • Research effort in automated theorem proving • 40+ different first order logic provers • Annual competition • Thousands of test problems • We will issue SUMO-based tests in TPTP format next month • Sigma connected to TPTP prover suite © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 48

Controlled English to Logic Translation • Automated translation from English to Logic • Uses

Controlled English to Logic Translation • Automated translation from English to Logic • Uses Word. Net-SUMO mappings for 100, 000 word sense vocabulary • Domain-independent • Development process – Start with a highly restricted language and gradually add linguistic features © 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com 49