Natural Language Processing Lecture Notes 1 1072020 1

  • Slides: 26
Download presentation
Natural Language Processing Lecture Notes 1 10/7/2020 1

Natural Language Processing Lecture Notes 1 10/7/2020 1

Today • Administration and Syllabus – course web page • Introduction 10/7/2020 2

Today • Administration and Syllabus – course web page • Introduction 10/7/2020 2

Natural Language Processing • What is it? – What goes into getting computers to

Natural Language Processing • What is it? – What goes into getting computers to perform useful and interesting tasks involving human languages. – Secondarily: insights that such computational work give us into human languages and human processing of language. 10/7/2020 3

Natural Language Processing • Foundations are in computer science (AI, theory, algorithms, …); linguistics;

Natural Language Processing • Foundations are in computer science (AI, theory, algorithms, …); linguistics; mathematics; logic and statistics; and psychology 10/7/2020 4

Why Should You Care? • Two trends 1. 2. 10/7/2020 An enormous amount of

Why Should You Care? • Two trends 1. 2. 10/7/2020 An enormous amount of knowledge is now available in machine readable form as natural language text Conversational agents are becoming an important form of human-computer communication 5

Knowledge of Language • • Words (words and their composition) Syntax (structure of sentences)

Knowledge of Language • • Words (words and their composition) Syntax (structure of sentences) Semantics (explicit meaning of sentence) Discourse and pragmatics (implicit and contextual meaning) 10/7/2020 6

Small Applications • • • Line breakers Hyphenators Spelling correctors Optical Character Recognition software

Small Applications • • • Line breakers Hyphenators Spelling correctors Optical Character Recognition software Grammar and style checkers 10/7/2020 7

 • • Big Applications Question answering Conversational agents Text summarization Machine translation 10/7/2020

• • Big Applications Question answering Conversational agents Text summarization Machine translation 10/7/2020 8

Note NLP, as in many areas of AI: – We’re often dealing with ill-defined

Note NLP, as in many areas of AI: – We’re often dealing with ill-defined problems – We don’t often come up with perfect solutions/algorithms – We can’t let either of those facts get in our way 10/7/2020 9

Course Material • We’ll be intermingling discussions of: – Linguistic topics • Syntax and

Course Material • We’ll be intermingling discussions of: – Linguistic topics • Syntax and meaning representations – Computational techniques • Context-free grammars – Applications • Translation and QA systems 10/7/2020 10

 • • Chapter 1 Knowledge of language Ambiguity Models and algorithms History 10/7/2020

• • Chapter 1 Knowledge of language Ambiguity Models and algorithms History 10/7/2020 11

Knowledge of Language • Phonetics and phonology: speech sounds, their production, and the rule

Knowledge of Language • Phonetics and phonology: speech sounds, their production, and the rule systems that govern their use • Morphology: words and their composition from more basic units – Cat, cats (inflectional morphology) – Child, children – Friend, friendly (derivational morphology) 10/7/2020 12

Knowledge of Language • Syntax: the structuring of words into legal larger phrases and

Knowledge of Language • Syntax: the structuring of words into legal larger phrases and sentences 10/7/2020 13

Semantics • The meaning of words and phrases – Lexical semantics: the study of

Semantics • The meaning of words and phrases – Lexical semantics: the study of the meanings of words – Compositional semantics: how to combine word meanings – Word-sense disambiguation • River bank vs. financial bank 10/7/2020 14

Pragmatics • Indirect speech acts: – Do you have a stapler? • Presupposition: –

Pragmatics • Indirect speech acts: – Do you have a stapler? • Presupposition: – Have you stopped beating your wife? • Deixis and point of view: – Zoe was angry at Joe. Where was he? • Implicature: -Yes, there are 3 flights to Boston. In fact, there are 4. * The general was assassinated. In fact, he isn’t dead. 10/7/2020 15

Discourse • Utterance interpretation in the context of the text or dialog – Sue

Discourse • Utterance interpretation in the context of the text or dialog – Sue took the trip to New York. She had a great time there. • Sue/she; • New York/there; • took/had (time) 10/7/2020 16

Ambiguity • Almost all of the non-trivial tasks performed by NLP systems are ambiguity

Ambiguity • Almost all of the non-trivial tasks performed by NLP systems are ambiguity resolution tasks • There is ambiguity at all levels of language 10/7/2020 17

Ambiguity • I saw the woman with the telescope • Syntactically ambiguous: – I

Ambiguity • I saw the woman with the telescope • Syntactically ambiguous: – I saw (NP the woman with the telescope) – I saw (NP the woman) (PP with the telescope) 10/7/2020 18

“I made her duck” • • I cooked waterfowl for her I cooked waterfowl

“I made her duck” • • I cooked waterfowl for her I cooked waterfowl belonging to her I create the duck she owns I caused her to lower head quickly… • Part of speech tagging: is “duck” a noun or verb? • Parsing syntactic structure: is “her” part of the “duck” NP? • Word-sense disambiguation (lexical semantics): does “make” mean create, lower head, or cook? 10/7/2020 19

Dealing with Ambiguity • Two approaches: – Tightly coupled interaction among processing levels; knowledge

Dealing with Ambiguity • Two approaches: – Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. – Pipeline processing • Most NLP systems are probabilistic: they make the most likely choices 10/7/2020 20

Models and Algorithms • Models (as we are using the term here): – Formalisms

Models and Algorithms • Models (as we are using the term here): – Formalisms to represent linguistic knowledge • Algorithms: – Used to manipulate the representations and produce the desired behavior • choosing among possibilities and combining pieces 10/7/2020 21

 • • Models State Machines: finite state automata, finite state transducers Formal rule

• • Models State Machines: finite state automata, finite state transducers Formal rule systems: context free grammars Logical formalisms: first-order predicate calculus; higher-order logics Models of uncertainty: Bayesian probability theory 10/7/2020 22

Algorithms • Many of the algorithms that we’ll study will turn out to be

Algorithms • Many of the algorithms that we’ll study will turn out to be transducers; algorithms that take one kind of structure as input and output another. 10/7/2020 23

Algorithms • In particular. . – State-space search • To manage the problem of

Algorithms • In particular. . – State-space search • To manage the problem of making choices during processing when we lack the information needed to make the right choice – Dynamic programming • To avoid having to redo work during the course of a state-space search – Machine Learning (classifiers, EM, etc) 10/7/2020 24

State Space Search • States represent pairings of partially processed inputs with partially constructed

State Space Search • States represent pairings of partially processed inputs with partially constructed answers – E. g. sentence + partial parse tree • Goal is to arrive at the right/best structure after having processed all the input. – E. g. the best parse tree spanning the sentence • As with most interesting AI problems the spaces are too large and the criteria for “bestness” are difficult to encode (so heuristics, probabilities) 10/7/2020 25

Dynamic Programming • Don’t do the same work over and over. • Avoid this

Dynamic Programming • Don’t do the same work over and over. • Avoid this by building and making use of solutions to sub-problems that must be invariant across all parts of the space. 10/7/2020 26