Natural Language Processing Artificial Intelligence CMSC 25000 February
Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002
Agenda • Why NLP? – Goals & Applications • Challenges: Knowledge & Ambiguity – Key types of knowledge • Morphology, Syntax, Semantics, Pragmatics, Discourse – Handling Ambiguity • Syntactic Ambiguity: Probabilistic Parsing • Semantic Ambiguity: Word Sense Disambiguation • Conclusions
Why Language? • Natural Language in Artificial Intelligence – Language use as distinctive feature of human intelligence – Infinite utterances: • Diverse languages with fundamental similarities • “Computational linguistics” – Communicative acts • Inform, request, . . .
Why Language? Applications • Machine Translation • Question-Answering – Database queries to web search • Spoken language systems • Intelligent tutoring
Knowledge of Language • What does it mean to know a language? – Know the words (lexicon) • Pronunciation, Formation, Conjugation – Know how the words form sentences • Sentence structure, Compositional meaning – Know how to interpret the sentence • Statement, question, . . – Know how to group sentences • Narrative coherence, dialogue
Word-level Knowledge • Lexicon: – List of legal words in a language – Part of speech: • noun, verb, adjective, determiner • Example: – Noun -> cat | dog | mouse | ball | rock – Verb -> chase | bite | fetch | bat – Adjective -> black | brown | furry | striped | heavy – Determiner -> the | that | an
Word-level Knowledge: Issues • Issue 1: Lexicon Size – Potentially HUGE! – Controlling factor: morphology • Store base forms (roots/stems) – Use morphologic process to generate / analyze – E. g. Dog: dog(s); sing: sings, sang, sung, singing, singer, . . • Issue 2: Lexical ambiguity – rock: N/V; dog: N/V; – “Time flies like a banana”
Sentence-level Knowledge: Syntax • Language models – More than just words: “banana a flies time like” – Formal vs natural: Grammar defines language Chomsky Hierarchy Context A-> a. Bc Free Recursively =Any Enumerable Context = AB->BA Sensitive Regular S->a. S Expression a*b*
Syntactic Analysis: Grammars • Natural vs Formal languages – Natural languages have degrees of acceptability • ‘It ain’t hard’; ‘You gave what to whom? ’ • Grammar combines words into phrases – S-> NP VP – NP -> {Det} {Adj} N – VP -> V | V NP PP
Syntactic Analysis: Parsing • Recover phrase structure from sentence – Based on grammar S NP Det Adj The black VP N cat V chased NP Det Adj N the furry mouse
Syntactic Analysis: Parsing • Issue 1: Complexity • Solution 1: Chart parser - dynamic programming – O( ) • Issue 2: Structural ambiguity – ‘I saw the man on the hill with the telescope’ • Is the telescope on the hill? ’ • Solution 2 (partial): Probabilistic parsing
Semantic Analysis • Grammatical = Meaningful – “Colorless green ideas sleep furiously” • Compositional Semantics – Meaning of a sentence is meaning of subparts – Associate semantic interpretation with syntactic – E. g. Nouns are variables (themselves): cat, mouse • Adjectives: unary predicates: Black(cat), Furry(mouse) • Verbs: multi-place: VP: x chased(x, Furry(mouse)) • Sentence ( x chased(x, Furry(mouse))Black(cat) – chased(Black(cat), Furry(mouse))
Semantic Ambiguity • Examples: – I went to the bank • of the river • to deposit some money – He banked • at First Union • the plane • Interpretation depends on – Sentence (or larger) topic context – Syntactic structure
Pragmatics & Discourse • Interpretation in context – Act accomplished by utterance • “Do you have the time? ”, “Can you pass the salt? ” • Requests with non-literal meaning – Also, includes politeness, performatives, etc • Interpretation of multiple utterances – “The cat chased the mouse. It got away. ” – Resolve referring expressions
Natural Language Understanding Input Tokenization/ Morphology Parsing Semantic Analysis Pragmatics/ Meaning Discourse • Key issues: – Knowledge • How acquire this knowledge of language? – Hand-coded? Automatically acquired? – Ambiguity • How determine appropriate interpretation? – Pervasive, preference-based
Handling Syntactic Ambiguity • Natural language syntax • Varied, has DEGREES of acceptability • Ambiguous • Probability: framework for preferences – Augment original context-free rules: PCFG – Add probabilities to transitions 0. 2 0. 45 NP -> N VP -> V 0. 65 PP -> P NP NP -> 0. 45 Det N VP -> V NP 0. 10 NP -> Det Adj N VP 0. 10 -> V NP PP 0. 05 NP -> NP PP 1. 0 0. 85 S -> NP VP S 0. 15 -> S conj S
PCFGs • Learning probabilities – Strategy 1: Write (manual) CFG, • Use treebank (collection of parse trees) to find probabilities – Strategy 2: Use larger treebank (+ linguistic constraint) • Learn rules & probabilities (inside-outside algorithm) • Parsing with PCFGs – Rank parse trees based on probability – Provides graceful degradation • Can get some parse even for unusual constructions - low value
Parse Ambiguity • Two parse trees S S NP N I NP VP V NP NP PP Det N P NP Det N saw the man with the telescope I saw the man with the telescope
Parse Probabilities – T(ree), S(entence), n(ode), R(ule) – T 1 = 0. 85*0. 2*0. 1*0. 65*1*0. 65 = 0. 007 – T 2 = 0. 85*0. 2*0. 45*0. 05*0. 65*1*0. 65 = 0. 003 • Select T 1 • Best systems achieve 92 -93% accuracy
Semantic Ambiguity • “Plant” ambiguity – Botanical vs Manufacturing senses • Two types of context – Local: 1 -2 words away – Global: several sentence window • Two observations (Yarowsky 1995) – One sense per collocation (local) – One sense per discourse (global)
Learn Disambiguators • Initialize small set of “seed” cases • Collect local context information – “collocations” • E. g. 2 words away from “production”, 1 word from “seed” • • Contexts = rules Make decision list= rules ranked by mutual info Iterate: Labeling via DL, collecting contexts Label all entries in discourse with majority sense – Repeat
Disambiguate • For each new unlabeled case, – Use decision list to label • > 95% accurate on set of highly ambiguous – Also used for accent restoration in e-mail
Natural Language Processing • Goals: Understand imitate distinctive human capacity • Myriad applications: MT, Q&A, SLS • Key Issues: – Capturing knowledge of language • Automatic acquisition current focus: linguistics+ML – Resolving ambiguity, managing preference • Apply (probabilistic) knowledge • Effective in constrained environment
- Slides: 23