Natural Language Processing Lecture Notes 1 342021 1

  • Slides: 37
Download presentation
Natural Language Processing Lecture Notes 1 3/4/2021 1

Natural Language Processing Lecture Notes 1 3/4/2021 1

Today • Administration and Syllabus – course web page • Introduction 3/4/2021 2

Today • Administration and Syllabus – course web page • Introduction 3/4/2021 2

Natural Language Processing • What is it? – We’re going to study what goes

Natural Language Processing • What is it? – We’re going to study what goes into getting computers to perform useful and interesting tasks involving human languages. – We will be secondarily concerned with the insights that such computational work gives us into human languages and human processing of language. 3/4/2021 3

Natural Language Processing • Foundations are in computer science (AI, theory, algorithms, …); linguistics;

Natural Language Processing • Foundations are in computer science (AI, theory, algorithms, …); linguistics; mathematics, logic and statistics; and psychology 3/4/2021 4

Why Should You Care? • Two trends 1. An enormous amount of knowledge is

Why Should You Care? • Two trends 1. An enormous amount of knowledge is now available in machine readable form as natural language text 2. Conversational agents are becoming an important form of human-computer communication 3/4/2021 5

Knowledge of Language • • Words (words and their composition) Syntax (structure of sentences)

Knowledge of Language • • Words (words and their composition) Syntax (structure of sentences) Semantics (explicit meaning of sentence) Discourse and pragmatics (implicit and contextual meaning) 3/4/2021 6

Applications • First, what makes an application a language processing application (as opposed to

Applications • First, what makes an application a language processing application (as opposed to any other piece of software)? – An application that requires the use of knowledge about human languages • Example: Is Unix wc (word count) a language processing application? 3/4/2021 7

Applications • Word count? – When it counts words: Yes • To count words

Applications • Word count? – When it counts words: Yes • To count words you need to know what a word is. That’s knowledge of language. – When it counts lines and bytes: No • Lines and bytes are computer artifacts, not linguistic entities 3/4/2021 8

Small Applications • • • Line breakers Hyphenators Spelling correctors OCR software Grammar and

Small Applications • • • Line breakers Hyphenators Spelling correctors OCR software Grammar and style checkers 3/4/2021 9

Big Applications • • Question answering Conversational agents Text summarization Machine translation 3/4/2021 10

Big Applications • • Question answering Conversational agents Text summarization Machine translation 3/4/2021 10

Big Applications • These kinds of applications require a tremendous amount of knowledge of

Big Applications • These kinds of applications require a tremendous amount of knowledge of language. • Consider the following interaction with HAL the computer from 2001: A Space Odyssey 3/4/2021 11

HAL • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave,

HAL • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave, I’m afraid I can’t do that. • Morphology: producing contractions and plurals • Syntax: command vs statement, word groupings and structure • Semantics: words in isolation and compositionally 3/4/2021 12

HAL • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave,

HAL • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave, I’m afraid I can’t do that. • Pragmatics: politeness and indirectness – It is polite to respond, even if you’re planning to kill someone. – It is polite to pretend to want to be cooperative (I’m afraid, I can’t…) • Discourse: between utterance references (that) 3/4/2021 13

Caveat NLP, as in many areas of AI: – We’re often dealing with ill-defined

Caveat NLP, as in many areas of AI: – We’re often dealing with ill-defined problems – We don’t often come up with perfect solutions/algorithms – We can’t let either of those facts get in our way • If this bothers you, NLP may not be your area 3/4/2021 14

Course Material • We’ll be intermingling discussions of: – Linguistic topics • Syntax and

Course Material • We’ll be intermingling discussions of: – Linguistic topics • Syntax and meaning representations – Computational techniques • Context-free grammars – Applications • Language aids and QA systems 3/4/2021 15

Topics: Linguistics • • Word-level processing Syntactic processing Lexical and compositional semantics Discourse and

Topics: Linguistics • • Word-level processing Syntactic processing Lexical and compositional semantics Discourse and dialog processing 3/4/2021 16

Topics: Techniques • Finite-state methods • Context-free methods • Augmented grammars – Unification –

Topics: Techniques • Finite-state methods • Context-free methods • Augmented grammars – Unification – Logic 3/4/2021 • Probabilistic versions • Supervised machine learning 17

Topics: Applications • Small – Spelling correction • Often stand-alone • Enabling applications •

Topics: Applications • Small – Spelling correction • Often stand-alone • Enabling applications • Funding/Business plans Medium – Word-sense disambiguation – Named entity recognition – Information retrieval • • Large – Question answering – Conversational agents – Machine translation 3/4/2021 18

Just English? • The examples in this class will for the most part be

Just English? • The examples in this class will for the most part be English. • We’ll touch on MT, but it needs its own course on its own 3/4/2021 19

Chapter 1 • • Knowledge of language Ambiguity Models and algorithms History 3/4/2021 20

Chapter 1 • • Knowledge of language Ambiguity Models and algorithms History 3/4/2021 20

Returning to our HAL example…. • • Dave: Open the pod bay doors, Hal.

Returning to our HAL example…. • • Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do that. The knowledge that HAL needs to take part in this exchange can be broken down into somewhat discrete categories. Each category can then be handled in isolation. This, of course, leaves open the problem of how the categories interact. 3/4/2021 21

Knowledge of Language • Phonetics and phonology: speech sounds, their production, and the rule

Knowledge of Language • Phonetics and phonology: speech sounds, their production, and the rule systems that govern their use • Morphology: words and their composition from more basic units – Cat, cats (inflectional morphology) – Child, children – Friend, friendly (derivational morphology) 3/4/2021 22

Knowledge of Language • Syntax: the structuring of words into legal larger phrases and

Knowledge of Language • Syntax: the structuring of words into legal larger phrases and sentences – The textbook for the NLP class is great. – Jane met Mary. – Mary was met by Jane. • Met(Jane, Mary) 3/4/2021 23

Semantics • The meaning of words and phrases – Lexical semantics: the study of

Semantics • The meaning of words and phrases – Lexical semantics: the study of the meanings of words – Compositional semantics: how to combine word meanings – Word-sense disambiguation • River bank vs. financial bank 3/4/2021 24

Pragmatics • Indirect speech acts: – Do you have a stapler? • Presupposition: –

Pragmatics • Indirect speech acts: – Do you have a stapler? • Presupposition: – Have you stopped beating your wife? • Deixis and point of view: – Zoe was angry at Joe. Where was he? • Implicature: -Yes, there are 3 flights to Boston. In fact, there are 4. – Yes, there are 3 flights to Boston. *In fact, there are fewer than 2. 3/4/2021 25

Discourse • Utterance interpretation in the context of the text or dialog – Sue

Discourse • Utterance interpretation in the context of the text or dialog – Sue took the trip to New York. She had a great time there. • Sue/she; • New York/there; • took/had (time) 3/4/2021 26

Deconstructing HAL • • Recognizes speech and understands language Decides how to respond and

Deconstructing HAL • • Recognizes speech and understands language Decides how to respond and speaks reply With personality Recognizes user’s goals, adopts them, and helps achieve them (well, that’s what we want…) Remembers conversational history Customizes interaction to different individuals Learns from experience Possesses vast knowledge and is autonomous 3/4/2021 27

Ambiguity • Almost all of the non-trivial tasks performed by NLP systems are ambiguity

Ambiguity • Almost all of the non-trivial tasks performed by NLP systems are ambiguity resolution tasks • There is ambiguity at all levels of language 3/4/2021 28

Ambiguity • I saw the woman with the telescope • Syntactically ambiguous: – I

Ambiguity • I saw the woman with the telescope • Syntactically ambiguous: – I saw (NP the woman with the telescope) – I saw (NP the woman) (PP with the telescope) 3/4/2021 29

“I made her duck” • • I cooked waterfowl for her I cooked waterfowl

“I made her duck” • • I cooked waterfowl for her I cooked waterfowl belonging to her I create the duck she owns I caused her to lower head quickly… • Part of speech tagging: is “duck” a noun or verb? • Parsing syntactic structure: is “her” part of the “duck” NP? • Word-sense disambiguation (lexical semantics): does “make” mean create, lower head, or cook? 3/4/2021 30

Dealing with Ambiguity • Three approaches: – Tightly coupled interaction among processing levels; knowledge

Dealing with Ambiguity • Three approaches: – Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. – Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures. • Syntax proposes/semantics disposes approach – Probabilistic approaches based on making the most likely choices 3/4/2021 31

Models and Algorithms • Models (as we are using the term here): – Formalisms

Models and Algorithms • Models (as we are using the term here): – Formalisms to represent linguistic knowledge • Algorithms: – Used to manipulate the representations and produce the desired behavior (choosing among possibilities and combining pieces) 3/4/2021 32

Models • State Machines: finite state automata, finite state transducers • Formal rule systems:

Models • State Machines: finite state automata, finite state transducers • Formal rule systems: context free grammars • Logical formalisms: first-order predicate calculus; higher-order logics • Models of uncertainty: Bayesian probability theory 3/4/2021 33

Algorithms • Many of the algorithms that we’ll study will turn out to be

Algorithms • Many of the algorithms that we’ll study will turn out to be transducers; algorithms that take one kind of structure as input and output another. • Unfortunately, since language is ambiguous at all levels this is almost never simple. This leads us to employ algorithms that fall into two related categories… 3/4/2021 34

Algorithms • In particular. . – State-space search • To manage the problem of

Algorithms • In particular. . – State-space search • To manage the problem of making choices during processing when we lack the information needed to make the right choice – Dynamic programming • To avoid having to redo work during the course of a state-space search 3/4/2021 35

State Space Search • States represent pairings of partially processed inputs with partially constructed

State Space Search • States represent pairings of partially processed inputs with partially constructed answers • Goal is to arrive at the right/best structure after having processed all the input. • As with most interesting AI problems the spaces are too large and the criteria for “bestness” is difficult to encode 3/4/2021 36

Dynamic Programming • Don’t do the same work over and over. • Avoid this

Dynamic Programming • Don’t do the same work over and over. • Avoid this by building and making use of solutions to subproblems that must be invariant across all parts of the space. 3/4/2021 37