Lecture 17 Lexical Relations Word Net SIMS 202

  • Slides: 56
Download presentation
Lecture 17: Lexical Relations & Word. Net SIMS 202: Information Organization and Retrieval Prof.

Lecture 17: Lexical Relations & Word. Net SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10: 30 am - 12: 00 pm Fall 2004 http: //www. sims. berkeley. edu/academics/courses/is 202/f 04/ IS 202 – FALL 2004. 10. 26 - SLIDE 1

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 2

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 3

Definition of AI “. . . artificial intelligence [AI] is the science of making

Definition of AI “. . . artificial intelligence [AI] is the science of making machines do things that would require intelligence if done by [humans]” (Minsky, 1963) IS 202 – FALL 2004. 10. 26 - SLIDE 4

The Goals of AI Are Not New • Ancient Greece – Daedalus’ automata •

The Goals of AI Are Not New • Ancient Greece – Daedalus’ automata • Judaism’s myth of the Golem • 18 th century automata – Singing, dancing, playing chess? • Mechanical metaphors for mind – Clock – Telegraph/telephone network – Computer IS 202 – FALL 2004. 10. 26 - SLIDE 5

Some Areas of AI • • • Knowledge representation Programming languages Natural language understanding

Some Areas of AI • • • Knowledge representation Programming languages Natural language understanding Speech understanding Vision Robotics Planning Machine learning Expert systems Qualitative simulation IS 202 – FALL 2004. 10. 26 - SLIDE 6

AI or IA? • Artificial Intelligence (AI) – Make machines as smart as (or

AI or IA? • Artificial Intelligence (AI) – Make machines as smart as (or smarter than) people • Intelligence Amplification (IA) – Use machines to make people smarter IS 202 – FALL 2004. 10. 26 - SLIDE 7

Furnas: The Vocabulary Problem • People use different words to describe the same things

Furnas: The Vocabulary Problem • People use different words to describe the same things – “If one person assigns the name of an item, other untutored people will fail to access it on 80 to 90 percent of their attempts. ” – “Simply stated, the data tell us there is no one good access term for most objects. ” IS 202 – FALL 2004. 10. 26 - SLIDE 8

The Vocabulary Problem • How is it that we come to understand each other?

The Vocabulary Problem • How is it that we come to understand each other? – Shared context – Dialogue • How can machines come to understand what we say? – Shared context? – Dialogue? IS 202 – FALL 2004. 10. 26 - SLIDE 9

Vocabulary Problem Solutions? • Furnas et al. – Make the user memorize precise system

Vocabulary Problem Solutions? • Furnas et al. – Make the user memorize precise system meanings – Have the user and system interact to identify the precise referent – Provide infinite aliases to objects • Minsky and Lenat – Give the system “commonsense” so it can understand what the user’s words can mean IS 202 – FALL 2004. 10. 26 - SLIDE 10

CYC • Decades long effort to build a commonsense knowledge-base • Storied past •

CYC • Decades long effort to build a commonsense knowledge-base • Storied past • 100, 000 basic concepts • 1, 000 assertions about the world • The validity of Cyc’s assertions are context -dependent (default reasoning) IS 202 – FALL 2004. 10. 26 - SLIDE 11

Cyc Examples • Cyc can find the match between a user's query for "pictures

Cyc Examples • Cyc can find the match between a user's query for "pictures of strong, adventurous people" and an image whose caption reads simply "a man climbing a cliff" • Cyc can notice if an annual salary and an hourly salary are inadvertently being added together in a spreadsheet • Cyc can combine information from multiple databases to guess which physicians in practice together had been classmates in medical school • When someone searches for "Bolivia" on the Web, Cyc knows not to offer a follow-up question like "Where can I get free Bolivia online? " IS 202 – FALL 2004. 10. 26 - SLIDE 12

Cyc Applications • Applications currently available or in development – – – Integration of

Cyc Applications • Applications currently available or in development – – – Integration of Heterogeneous Databases Knowledge-Enhanced Retrieval of Captioned Information Guided Integration of Structured Terminology (GIST) Distributed AI WWW Information Retrieval • Potential applications – – – – Online brokering of goods and services "Smart" interfaces Intelligent character simulation for games Enhanced virtual reality Improved machine translation Improved speech recognition Sophisticated user modeling Semantic data mining IS 202 – FALL 2004. 10. 26 - SLIDE 13

Cyc’s Top-Level Ontology • • • • Fundamentals Top Level Time and Dates Types

Cyc’s Top-Level Ontology • • • • Fundamentals Top Level Time and Dates Types of Predicates Spatial Relations Quantities Mathematics Contexts Groups "Doing" Transformations Changes Of State Transfer Of Possession Movement Parts of Objects • • • • Composition of Substances Agents Organizations Actors Roles Professions Emotion Propositional Attitudes Social Biology Chemistry Physiology General Medicine • • • • Materials Waves Devices Construction Financial Food Clothing Weather Geography Transportation Information Perception Agreements Linguistic Terms Documentation http: //www. cyc. com/cyc-2 -1/toc. html IS 202 – FALL 2004. 10. 26 - SLIDE 14

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 15

Syntax • The syntax of a language is to be understood as a set

Syntax • The syntax of a language is to be understood as a set of rules which accounts for the distribution of word forms throughout the sentences of a language • These rules codify permissible combinations of classes of word forms IS 202 – FALL 2004. 10. 26 - SLIDE 16

Semantics • Semantics is the study of linguistic meaning • Two standard approaches to

Semantics • Semantics is the study of linguistic meaning • Two standard approaches to lexical semantics (cf. , sentential semantics; and, logical semantics): – (1) compositional – (2) relational IS 202 – FALL 2004. 10. 26 - SLIDE 17

Lexical Semantics: Compositional Approach • Compositional lexical semantics, introduced by Katz & Fodor (1963),

Lexical Semantics: Compositional Approach • Compositional lexical semantics, introduced by Katz & Fodor (1963), analyzes the meaning of a word in much the same way a sentence is analyzed into semantic components. The semantic components of a word are not themselves considered to be words, but are abstract elements (semantic atoms) postulated in order to describe word meanings (semantic molecules) and to explain the semantic relations between words. For example, the representation of bachelor might be ANIMATE and HUMAN and MALE and ADULT and NEVER MARRIED. The representation of man might be ANIMATE and HUMAN and MALE and ADULT; because all the semantic components of man are included in the semantic components of bachelor, it can be inferred that bachelor man. In addition, there are implicational rules between semantic components, e. g. HUMAN ANIMATE, which also look very much like meaning postulates. – George Miller, “On Knowing a Word, ” 1999 IS 202 – FALL 2004. 10. 26 - SLIDE 18

Lexical Semantics: Relational Approach • Relational lexical semantics was first introduced by Carnap (1956)

Lexical Semantics: Relational Approach • Relational lexical semantics was first introduced by Carnap (1956) in the form of meaning postulates, where each postulate stated a semantic relation between words. A meaning postulate might look something like dog animal (if x is a dog then x is an animal) or, adding logical constants, bachelor man and never married [if x is a bachelor then x is a man and not(x has married)] or tall not short [if x is tall then not(x is short)]. The meaning of a word was given, roughly, by the set of all meaning postulates in which it occurs. – George Miller, “On Knowing a Word, ” 1999 IS 202 – FALL 2004. 10. 26 - SLIDE 19

Pragmatics • Deals with the relation between signs or linguistic expressions and their users

Pragmatics • Deals with the relation between signs or linguistic expressions and their users • Deixis (literally “pointing out”) – E. g. , “I’ll be back in an hour” depends upon the time of the utterance • Conversational implicature – A: “Can you tell me the time? ” – B: “Well, the milkman has come. ” [I don’t know exactly, but perhaps you can deduce it from some extra information I give you. ] • Presupposition – “Are you still such a bad driver? ” • Speech acts – Constatives vs. performatives – E. g. , “I second the motion. ” • Conversational structure – E. g. , turn-taking rules IS 202 – FALL 2004. 10. 26 - SLIDE 20

Language • Language only hints at meaning • Most meaning of text lies within

Language • Language only hints at meaning • Most meaning of text lies within our minds and common understanding – “How much is that doggy in the window? ” • How much: social system of barter and trade (not the size of the dog) • “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own • “in the window” implies behind a store window, not really inside a window, requires notion of window shopping IS 202 – FALL 2004. 10. 26 - SLIDE 21

Semantics: The Meaning of Symbols • Semantics versus Syntax – add(3, 4) – 3+4

Semantics: The Meaning of Symbols • Semantics versus Syntax – add(3, 4) – 3+4 – (different syntax, same meaning) • Meaning versus Representation – What a person’s name is versus who they are • A rose by any other name. . . – What the computer program “looks like” versus what it actually does IS 202 – FALL 2004. 10. 26 - SLIDE 22

Semantics • Semantics: assigning meanings to symbols and expressions – Usually involves defining: •

Semantics • Semantics: assigning meanings to symbols and expressions – Usually involves defining: • Objects • Properties of objects • Relations between objects – More detailed versions include • • Events Time Places Measurements (quantities) IS 202 – FALL 2004. 10. 26 - SLIDE 23

The Role of Context • The concept associated with the symbol “ 21” means

The Role of Context • The concept associated with the symbol “ 21” means different things in different contexts – Examples? • The question “Is there any salt? ” – Asked of a waiter at a restaurant – Asked of an environmental scientist at work IS 202 – FALL 2004. 10. 26 - SLIDE 24

What’s in a Sentence? “A sentence is not a verbal snapshot or movie of

What’s in a Sentence? “A sentence is not a verbal snapshot or movie of an event. In framing an utterance, you have to abstract away from everything you know, or can picture, about a situation, and present a schematic version which conveys the essentials. In terms of grammatical marking, there is not enough time in the speech situation for any language to allow for the marking of everything which could possibly be significant to the message. ” Dan Slobin, in Language Acquisition: The state of the art, 1982 IS 202 – FALL 2004. 10. 26 - SLIDE 25

Lexical Relations • Conceptual relations link concepts – Goal of Artificial Intelligence • Lexical

Lexical Relations • Conceptual relations link concepts – Goal of Artificial Intelligence • Lexical relations link words – Goal of Linguistics IS 202 – FALL 2004. 10. 26 - SLIDE 26

Major Lexical Relations • • • Synonymy Polysemy Metonymy Hyponymy/Hypernymy Meronymy/Holonymy Antonymy IS 202

Major Lexical Relations • • • Synonymy Polysemy Metonymy Hyponymy/Hypernymy Meronymy/Holonymy Antonymy IS 202 – FALL 2004. 10. 26 - SLIDE 27

Synonymy • Different ways of expressing related concepts • Examples – cat, feline, Siamese

Synonymy • Different ways of expressing related concepts • Examples – cat, feline, Siamese cat • Overlaps with basic and subordinate levels • Synonyms are almost never truly substitutable – Used in different contexts – Have different implications • This is a point of contention IS 202 – FALL 2004. 10. 26 - SLIDE 28

Polysemy • Most words have more than one sense – Homonym: same sound and/or

Polysemy • Most words have more than one sense – Homonym: same sound and/or spelling, different meaning (http: //www. wikipedia. org/wiki/Homonym) • bank (river) • bank (financial) – Polysemy: different senses of same word (http: //www. wikipedia. org/wiki/Polysemy) • That dog has floppy ears. • She has a good ear for jazz. • bank (financial) has several related senses – the building, the institution, the notion of where money is stored IS 202 – FALL 2004. 10. 26 - SLIDE 29

Metonymy • Use one aspect of something to stand for the whole – The

Metonymy • Use one aspect of something to stand for the whole – The building stands for the institution of the bank. – Newscast: “The White House released new figures today. ” – Waitperson: “The ham sandwich spilled his drink. ” IS 202 – FALL 2004. 10. 26 - SLIDE 30

Hyponymy/Hyperonymy • ISA relation • Related to Superordinate and Subordinate level categories – hyponym(robin,

Hyponymy/Hyperonymy • ISA relation • Related to Superordinate and Subordinate level categories – hyponym(robin, bird) – hyponym(emu, bird) – hyponym(bird, animal) – hyperym(animal, bird) • A is a hypernym of B is a type of A • A is a hyponym of B if A is a type of B IS 202 – FALL 2004. 10. 26 - SLIDE 31

Basic-Level Categories (Review) • Brown 1958, 1965, Berlin et al. , 1972, 1973 •

Basic-Level Categories (Review) • Brown 1958, 1965, Berlin et al. , 1972, 1973 • Folk biology: – – – Unique beginner: plant, animal Life form: tree, bush, flower Generic name: pine, oak, maple, elm Specific name: Ponderosa pine, white pine Varietal name: Western Ponderosa pine • No overlap between levels • Level 3 is basic – Corresponds to genus – Folk biological categories correspond accurately to scientific biological categories only at the basic level IS 202 – FALL 2004. 10. 26 - SLIDE 32

Psychologically Primary Levels SUPERORDINATE BASIC LEVEL SUBORDINATE animal dog terrier furniture chair rocker •

Psychologically Primary Levels SUPERORDINATE BASIC LEVEL SUBORDINATE animal dog terrier furniture chair rocker • Children take longer to learn superordinate • Superordinate not associated with mental images or motor actions IS 202 – FALL 2004. 10. 26 - SLIDE 33

Meronymy/Holonymy • Part/Whole relation – meronym(beak, bird) – meronym(bark, tree) – holonym(tree, bark) •

Meronymy/Holonymy • Part/Whole relation – meronym(beak, bird) – meronym(bark, tree) – holonym(tree, bark) • Transitive conceptually but not lexically – The knob is a part of the door. – The door is a part of the house. – ? The knob is a part of the house ? • Holonyms are (approximately) the inverse of meronyms IS 202 – FALL 2004. 10. 26 - SLIDE 34

Antonymy • Lexical opposites – antonym(large, small) – antonym(big, little) – but not large,

Antonymy • Lexical opposites – antonym(large, small) – antonym(big, little) – but not large, little • Many antonymous relations can be reliably detected by looking for statistical correlations in large text collections. (Justeson & Katz 91) IS 202 – FALL 2004. 10. 26 - SLIDE 35

Thesauri and Lexical Relations • Polysemy: same word, different senses of meaning – Slightly

Thesauri and Lexical Relations • Polysemy: same word, different senses of meaning – Slightly different concepts expressed similarly • Synonyms: different words, related senses of meanings – Different ways to express similar concepts • Thesauri help draw all these together • Thesauri also commonly define a set of relations between terms that is similar to lexical relations – BT, NT, RT • More on Thesauri next week… IS 202 – FALL 2004. 10. 26 - SLIDE 36

What is an Ontology? • From Merriam-Webster’s Collegiate – A branch of metaphysics concerned

What is an Ontology? • From Merriam-Webster’s Collegiate – A branch of metaphysics concerned with the nature and relations of being – A particular theory about the nature of being or the kinds of existence • More prosaically – A carving up of the world’s meanings – Determine what things exist, but not how they interrelate • Related terms – Taxonomy, dictionary, category structure • Commonly used now in CS literature to describe structures that function as Thesauri IS 202 – FALL 2004. 10. 26 - SLIDE 37

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 38

Word. Net • Started in 1985 by George Miller, students, and colleagues at the

Word. Net • Started in 1985 by George Miller, students, and colleagues at the Cognitive Science Laboratory, Princeton University – Miller also known as the author of the paper “The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information” (1956) • Can be downloaded for free: – www. cogsci. princeton. edu/~wn/ IS 202 – FALL 2004. 10. 26 - SLIDE 39

Miller on Word. Net • “In terms of coverage, Word. Net’s goals differ little

Miller on Word. Net • “In terms of coverage, Word. Net’s goals differ little from those of a good standard college-level dictionary, and the semantics of Word. Net is based on the notion of word sense that lexicographers have traditionally used in writing dictionaries. It is in the organization of that information that Word. Net aspires to innovation. ” – (Miller, 1998, Chapter 1) IS 202 – FALL 2004. 10. 26 - SLIDE 40

Presuppositions of Word. Net Project • Separability hypothesis – The lexical component of language

Presuppositions of Word. Net Project • Separability hypothesis – The lexical component of language can be separated and studied in its own right • Patterning hypothesis – People have knowledge of the systematic patterns and relations between word meanings • Comprehensiveness hypothesis – Computational linguistics programs need a store of lexical knowledge that is as extensive as that which people have IS 202 – FALL 2004. 10. 26 - SLIDE 41

Word. Net: Size Word. Net Uses “Synsets” – sets of synonymous terms POS Synsets

Word. Net: Size Word. Net Uses “Synsets” – sets of synonymous terms POS Synsets Noun Unique Strings 114648 Verb 11306 13508 Adjective 21436 18563 Adverb 4669 3664 Totals 152059 115424 IS 202 – FALL 2004 79689 2004. 10. 26 - SLIDE 42

Structure of Word. Net IS 202 – FALL 2004. 10. 26 - SLIDE 43

Structure of Word. Net IS 202 – FALL 2004. 10. 26 - SLIDE 43

Structure of Word. Net IS 202 – FALL 2004. 10. 26 - SLIDE 44

Structure of Word. Net IS 202 – FALL 2004. 10. 26 - SLIDE 44

Structure of Word. Net IS 202 – FALL 2004. 10. 26 - SLIDE 45

Structure of Word. Net IS 202 – FALL 2004. 10. 26 - SLIDE 45

Unique Beginners • Entity, something – (anything having existence (living or nonliving)) • Psychological_feature

Unique Beginners • Entity, something – (anything having existence (living or nonliving)) • Psychological_feature – (a feature of the mental life of a living organism) • Abstraction – (a general concept formed by extracting common features from specific examples) • State – (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") • Event – (something that happens at a given place and time) IS 202 – FALL 2004. 10. 26 - SLIDE 46

Unique Beginners • Act, human_action, human_activity – (something that people do or cause to

Unique Beginners • Act, human_action, human_activity – (something that people do or cause to happen) • Group, grouping – (any number of entities (members) considered as a unit) • Possession – (anything owned or possessed) • Phenomenon – (any state or process known through the senses rather than by intuition or reasoning) IS 202 – FALL 2004. 10. 26 - SLIDE 47

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 48

Word. Net Demo • Available online (from Unix) if you wish to try it…

Word. Net Demo • Available online (from Unix) if you wish to try it… – Login to irony and type “wn word” for any word you are interested in – Demo… IS 202 – FALL 2004. 10. 26 - SLIDE 49

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 50

Discussion Questions IS 202 – FALL 2004. 10. 26 - SLIDE 51

Discussion Questions IS 202 – FALL 2004. 10. 26 - SLIDE 51

Discussion Questions IS 202 – FALL 2004. 10. 26 - SLIDE 52

Discussion Questions IS 202 – FALL 2004. 10. 26 - SLIDE 52

Discussion Questions IS 202 – FALL 2004. 10. 26 - SLIDE 53

Discussion Questions IS 202 – FALL 2004. 10. 26 - SLIDE 53

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Next

Lecture Overview • • • Review Lexical Relations Word. Net Demo Discussion Questions Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 – FALL 2004. 10. 26 - SLIDE 54

Homework • Read – Defining Information Architecture (Rosenfeld) • Discussion Question volunteers? IS 202

Homework • Read – Defining Information Architecture (Rosenfeld) • Discussion Question volunteers? IS 202 – FALL 2004. 10. 26 - SLIDE 55

Next Time • Introduction to the phone project IS 202 – FALL 2004. 10.

Next Time • Introduction to the phone project IS 202 – FALL 2004. 10. 26 - SLIDE 56