Natural Language Processing Lecture Notes 1 342021 1





































- Slides: 37
Natural Language Processing Lecture Notes 1 3/4/2021 1
Today • Administration and Syllabus – course web page • Introduction 3/4/2021 2
Natural Language Processing • What is it? – We’re going to study what goes into getting computers to perform useful and interesting tasks involving human languages. – We will be secondarily concerned with the insights that such computational work gives us into human languages and human processing of language. 3/4/2021 3
Natural Language Processing • Foundations are in computer science (AI, theory, algorithms, …); linguistics; mathematics, logic and statistics; and psychology 3/4/2021 4
Why Should You Care? • Two trends 1. An enormous amount of knowledge is now available in machine readable form as natural language text 2. Conversational agents are becoming an important form of human-computer communication 3/4/2021 5
Knowledge of Language • • Words (words and their composition) Syntax (structure of sentences) Semantics (explicit meaning of sentence) Discourse and pragmatics (implicit and contextual meaning) 3/4/2021 6
Applications • First, what makes an application a language processing application (as opposed to any other piece of software)? – An application that requires the use of knowledge about human languages • Example: Is Unix wc (word count) a language processing application? 3/4/2021 7
Applications • Word count? – When it counts words: Yes • To count words you need to know what a word is. That’s knowledge of language. – When it counts lines and bytes: No • Lines and bytes are computer artifacts, not linguistic entities 3/4/2021 8
Small Applications • • • Line breakers Hyphenators Spelling correctors OCR software Grammar and style checkers 3/4/2021 9
Big Applications • • Question answering Conversational agents Text summarization Machine translation 3/4/2021 10
Big Applications • These kinds of applications require a tremendous amount of knowledge of language. • Consider the following interaction with HAL the computer from 2001: A Space Odyssey 3/4/2021 11
HAL • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave, I’m afraid I can’t do that. • Morphology: producing contractions and plurals • Syntax: command vs statement, word groupings and structure • Semantics: words in isolation and compositionally 3/4/2021 12
HAL • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave, I’m afraid I can’t do that. • Pragmatics: politeness and indirectness – It is polite to respond, even if you’re planning to kill someone. – It is polite to pretend to want to be cooperative (I’m afraid, I can’t…) • Discourse: between utterance references (that) 3/4/2021 13
Caveat NLP, as in many areas of AI: – We’re often dealing with ill-defined problems – We don’t often come up with perfect solutions/algorithms – We can’t let either of those facts get in our way • If this bothers you, NLP may not be your area 3/4/2021 14
Course Material • We’ll be intermingling discussions of: – Linguistic topics • Syntax and meaning representations – Computational techniques • Context-free grammars – Applications • Language aids and QA systems 3/4/2021 15
Topics: Linguistics • • Word-level processing Syntactic processing Lexical and compositional semantics Discourse and dialog processing 3/4/2021 16
Topics: Techniques • Finite-state methods • Context-free methods • Augmented grammars – Unification – Logic 3/4/2021 • Probabilistic versions • Supervised machine learning 17
Topics: Applications • Small – Spelling correction • Often stand-alone • Enabling applications • Funding/Business plans Medium – Word-sense disambiguation – Named entity recognition – Information retrieval • • Large – Question answering – Conversational agents – Machine translation 3/4/2021 18
Just English? • The examples in this class will for the most part be English. • We’ll touch on MT, but it needs its own course on its own 3/4/2021 19
Chapter 1 • • Knowledge of language Ambiguity Models and algorithms History 3/4/2021 20
Returning to our HAL example…. • • Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do that. The knowledge that HAL needs to take part in this exchange can be broken down into somewhat discrete categories. Each category can then be handled in isolation. This, of course, leaves open the problem of how the categories interact. 3/4/2021 21
Knowledge of Language • Phonetics and phonology: speech sounds, their production, and the rule systems that govern their use • Morphology: words and their composition from more basic units – Cat, cats (inflectional morphology) – Child, children – Friend, friendly (derivational morphology) 3/4/2021 22
Knowledge of Language • Syntax: the structuring of words into legal larger phrases and sentences – The textbook for the NLP class is great. – Jane met Mary. – Mary was met by Jane. • Met(Jane, Mary) 3/4/2021 23
Semantics • The meaning of words and phrases – Lexical semantics: the study of the meanings of words – Compositional semantics: how to combine word meanings – Word-sense disambiguation • River bank vs. financial bank 3/4/2021 24
Pragmatics • Indirect speech acts: – Do you have a stapler? • Presupposition: – Have you stopped beating your wife? • Deixis and point of view: – Zoe was angry at Joe. Where was he? • Implicature: -Yes, there are 3 flights to Boston. In fact, there are 4. – Yes, there are 3 flights to Boston. *In fact, there are fewer than 2. 3/4/2021 25
Discourse • Utterance interpretation in the context of the text or dialog – Sue took the trip to New York. She had a great time there. • Sue/she; • New York/there; • took/had (time) 3/4/2021 26
Deconstructing HAL • • Recognizes speech and understands language Decides how to respond and speaks reply With personality Recognizes user’s goals, adopts them, and helps achieve them (well, that’s what we want…) Remembers conversational history Customizes interaction to different individuals Learns from experience Possesses vast knowledge and is autonomous 3/4/2021 27
Ambiguity • Almost all of the non-trivial tasks performed by NLP systems are ambiguity resolution tasks • There is ambiguity at all levels of language 3/4/2021 28
Ambiguity • I saw the woman with the telescope • Syntactically ambiguous: – I saw (NP the woman with the telescope) – I saw (NP the woman) (PP with the telescope) 3/4/2021 29
“I made her duck” • • I cooked waterfowl for her I cooked waterfowl belonging to her I create the duck she owns I caused her to lower head quickly… • Part of speech tagging: is “duck” a noun or verb? • Parsing syntactic structure: is “her” part of the “duck” NP? • Word-sense disambiguation (lexical semantics): does “make” mean create, lower head, or cook? 3/4/2021 30
Dealing with Ambiguity • Three approaches: – Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. – Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures. • Syntax proposes/semantics disposes approach – Probabilistic approaches based on making the most likely choices 3/4/2021 31
Models and Algorithms • Models (as we are using the term here): – Formalisms to represent linguistic knowledge • Algorithms: – Used to manipulate the representations and produce the desired behavior (choosing among possibilities and combining pieces) 3/4/2021 32
Models • State Machines: finite state automata, finite state transducers • Formal rule systems: context free grammars • Logical formalisms: first-order predicate calculus; higher-order logics • Models of uncertainty: Bayesian probability theory 3/4/2021 33
Algorithms • Many of the algorithms that we’ll study will turn out to be transducers; algorithms that take one kind of structure as input and output another. • Unfortunately, since language is ambiguous at all levels this is almost never simple. This leads us to employ algorithms that fall into two related categories… 3/4/2021 34
Algorithms • In particular. . – State-space search • To manage the problem of making choices during processing when we lack the information needed to make the right choice – Dynamic programming • To avoid having to redo work during the course of a state-space search 3/4/2021 35
State Space Search • States represent pairings of partially processed inputs with partially constructed answers • Goal is to arrive at the right/best structure after having processed all the input. • As with most interesting AI problems the spaces are too large and the criteria for “bestness” is difficult to encode 3/4/2021 36
Dynamic Programming • Don’t do the same work over and over. • Avoid this by building and making use of solutions to subproblems that must be invariant across all parts of the space. 3/4/2021 37