Natural Language Processing NLP Filename eie 426 nlp0809

  • Slides: 37
Download presentation
Natural Language Processing (NLP) Filename: eie 426 -nlp-0809. ppt 2021/3/2 EIE 426 -AICV 1

Natural Language Processing (NLP) Filename: eie 426 -nlp-0809. ppt 2021/3/2 EIE 426 -AICV 1

Contents n n What we want Machine Translation (MT) Components of natural language processing

Contents n n What we want Machine Translation (MT) Components of natural language processing Syntactic processing - Formal grammars - Parsers - The (CYK) Parsing Algorithm 2021/3/2 EIE 426 -AICV 2

What We Want Conversation (user interface): n Front ends to Data. Base Management System

What We Want Conversation (user interface): n Front ends to Data. Base Management System (DBMS) n Consultant or help systems Textual tasks: Reading stories and - summarizing them - answering questions about them - extracting relevant information from them Machine Translation (MT) Better information retrieval Grammar/style checkers 2021/3/2 EIE 426 -AICV 3

Machine Translation (MT) n ALPAC (Automatic Language Processing Advisory Committee) Report in 1966 (from

Machine Translation (MT) n ALPAC (Automatic Language Processing Advisory Committee) Report in 1966 (from 10 years research) n No high quality, fully automated MT had been achieved, nor was there the prospect for any in the immediate future. n Fully automated MT would not be desirable, at least in the near term, because it was unlikely that it would be more cost-effective than hiring translators. n A recent article: “A renewed international effort is gearing up to design computers and software that smash language barriers and create a borderless global marketplace. ” by Steve Silberman (2000) 2021/3/2 EIE 426 -AICV 4

Main Components in an MT System q q grammatical analysis, or parsing dictionary lookup

Main Components in an MT System q q grammatical analysis, or parsing dictionary lookup (for word by word translation) Grammatical transfer, to change the structure from that of one language into the other synthesis of the sentence from the new parse tree 2021/3/2 EIE 426 -AICV 5

Examples from SYSTRAN (an online translator) n n n Out of sight and out

Examples from SYSTRAN (an online translator) n n n Out of sight and out of mind. E-F(French)-E: out of the sight, of the spirit E-G(German)-E: from sight from understanding out (not very good) The spirit is willing but the body weak. E-F-E: The spirit is laid out but the weak body. E-G-E: The spirit is ready however the weak body. (not very good) SYSTRAN has been used successfully to translate Xerox copier manuals from English to many other languages (a more constrained application) 2021/3/2 EIE 426 -AICV 6

Features of Language that Make It Both Difficult and Useful n The problem: English

Features of Language that Make It Both Difficult and Useful n The problem: English sentences are incomplete descriptions of the information that they are intended to convey: E. g. , Some dogs are outside. Some dogs are on the lawn. Three dogs are on the lawn. Rover, Tripp, and Spot are on the lawn. n The Good Side: Language allows speakers to be as vague or as precise as they like. It also allows speakers to leave out things they believe their hearers already know. 2021/3/2 EIE 426 -AICV 7

Features of Language that Make It Both Difficult and Useful (cont. ) n The

Features of Language that Make It Both Difficult and Useful (cont. ) n The problem: No natural language program can be complete because new words, expressions, and meanings can be generated quite freely: E. g. , I will fax it to you. n The Good Side: Language can evolve as the experiences that we want to communicate about evolve. n The problem: There are lots of ways to say the same thing: E. g. , Mary was born on October 11. Mary’s birthday is October 11. n The Good Side: When you know a lot, facts imply each other. Language is intended to be used by agents who know a lot. 2021/3/2 EIE 426 -AICV 8

Components of Natural Language Processing n Morphological Analysis – Individual words are analyzed into

Components of Natural Language Processing n Morphological Analysis – Individual words are analyzed into their components, and nonword tokens, such as punctuation, are separated from the words. An example: I want to print Bill’s. init file. Bill’s --> Bill (the proper noun) + ‘s (the possessive suffix) Recognize the sequence “. init” as a file extension. In addition, this process will usually assign syntactic categories to all the words in the sentence. 2021/3/2 EIE 426 -AICV 9

Components of Natural Language Processing (cont. ) n Syntactic Analysis (Parsing) – Linear sequences

Components of Natural Language Processing (cont. ) n Syntactic Analysis (Parsing) – Linear sequences of words are transformed into structures that show the words relate to each other. Some word sequences may be rejected if they violate the language’s rules for how words may be combined. Syntactic analysis must exploit the results of morphological analysis to build a structural description of the sentence. 2021/3/2 EIE 426 -AICV 10

Components of Natural Language Processing (cont. ) The Results of Syntactic Analysis of “I

Components of Natural Language Processing (cont. ) The Results of Syntactic Analysis of “I want to print Bill’s. init file. ” Category Examples Determiner Adjective Adverb Noun Auxiliary verb Verb Preposition Quantifier Complementizer Pronoun the, this, a big, high slowly file, Sarah, book will, have print, give to, in all, every that, which she, him, their 2021/3/2 EIE 426 -AICV 11

Components of Natural Language Processing (cont. ) n Semantic Analysis – The structures created

Components of Natural Language Processing (cont. ) n Semantic Analysis – The structures created by the syntactic analyzer are assigned meanings. A mapping is made between the syntactic structures and objects in the task domain. Structures for which no such mapping is possible rejected. Sentence-Level Processing • Semantic Grammars: Syntactic + Semantic + Pragmatic • Case Grammars: The structure built by a parser contains some semantic information. • Conceptual Parsing: Syntactic + Semantic • Approximately Compositional Semantic Interpretation: Semantic processing is applied to the result of performing a syntactic parse. 2021/3/2 EIE 426 -AICV 12

Components of Natural Language Processing (cont. ) n Discourse Integration – The meaning of

Components of Natural Language Processing (cont. ) n Discourse Integration – The meaning of an individual sentence may depend on the sentences that proceed it and may influence the meanings of the sentences that follow it. n We do not know to whom the pronoun “I” or the proper noun “Bill” refers from the previous three steps. To pin down these references requires an appeal to a model of the current discourses context, from which we can learn that the current user is User 068 and that the only person named “Bill” is User 073. 2021/3/2 EIE 426 -AICV 13

Components of Natural Language Processing (cont. ) n Pragmatic Analysis – The structure representing

Components of Natural Language Processing (cont. ) n Pragmatic Analysis – The structure representing what was said is reinterpreted to determine what was actually meant. The final step toward effective understanding is to decide what to do as a result. In this case, a translation from the knowledge-based representation to a command to be executed by the system has to be done. 2021/3/2 EIE 426 -AICV 14

Syntactic Processing (parsing) n Syntactic processing is the step in which a flat input

Syntactic Processing (parsing) n Syntactic processing is the step in which a flat input sentence is converted into a hierarchical structure that corresponds to the units of meaning in the sentence. 02 March 2021 EIE 426 -AICV 15

A Sentence ‘the program crashes the computer’ 1. <sentence> 2. <noun pharse> <verb pharse>

A Sentence ‘the program crashes the computer’ 1. <sentence> 2. <noun pharse> <verb pharse> 3. <article> <noun> <verb pharse> 4. the <noun> <verb pharse> 5. the program <verb pharse> 6. the program <verb> <noun pharse> 7. the program crashes <noun pharse> 8. the program crashes <article> <noun> 9. the program crashes the <noun> 10. the program crashes the computer Step 10 Generation of a sentence by using a set of grammar 02 March 2021 Step 1 Parsing of a sentence to identify the grammar AI-ZC 16

The Tree Representation NP ART the 02 March 2021 parsing Generation (Deviation) S VP

The Tree Representation NP ART the 02 March 2021 parsing Generation (Deviation) S VP N program NP V crashes AI-ZC ART N the computer 17

Formal Grammars A: a finite set of symbols x: a string (word) over A

Formal Grammars A: a finite set of symbols x: a string (word) over A |x| = n : the length of string x The null string: The positive closure of A: All strings of length n The closure of A: 02 March 2021 EIE 426 -AICV 18

Definition 1: A formal grammar is a four-tuple G = {N, T, P, S),

Definition 1: A formal grammar is a four-tuple G = {N, T, P, S), where N: a finite set of non terminal symbols T: a finite set of terminal symbols S: a set of initial symbols P: a set of productions (rewriting rules) The vocabulary: Each production 02 March 2021 is of the form EIE 426 -AICV 19

In the English sentence case: The terminals: the, program, crashes, computer. The non-terminals: <article>,

In the English sentence case: The terminals: the, program, crashes, computer. The non-terminals: <article>, <noun phrase>, etc. The initial symbol: <sentence>. Two productions: S NP ART the VP N program 02 March 2021 NP V crashes ART N the computer EIE 426 -AICV 20

Definition 2: A production in G, If such that Then 02 March 2021 EIE

Definition 2: A production in G, If such that Then 02 March 2021 EIE 426 -AICV 21

Definition 3: The language generated by G is given by An Example: N =

Definition 3: The language generated by G is given by An Example: N = {S, A}, T = {a, b}, L(G) = {aba, aabba, aaabbba, …} Definition 4: Four types of grammars Type 0 (Free or Unrestricted): 02 March 2021 EIE 426 -AICV 22

Type 1 (Context-Sensitive): An Example: 02 March 2021 EIE 426 -AICV 23

Type 1 (Context-Sensitive): An Example: 02 March 2021 EIE 426 -AICV 23

Type 2 (Context Free): CFGs are the most descriptively versatile grammars for which effective

Type 2 (Context Free): CFGs are the most descriptively versatile grammars for which effective (and efficient) parsers are available. Examples: 02 March 2021 EIE 426 -AICV 24

Type 3 (Finite-State or Regular): S b a a Graphical Representation of FSGs: a

Type 3 (Finite-State or Regular): S b a a Graphical Representation of FSGs: a b T T: a terminal node 02 March 2021 EIE 426 -AICV 25

The trade-offs among grammars Increasing production constraints Increasing representational (descriptive) capability 02 March 2021

The trade-offs among grammars Increasing production constraints Increasing representational (descriptive) capability 02 March 2021 EIE 426 -AICV 26

Parsing • • Top-Down Parsing -- Begin with the start symbol and apply the

Parsing • • Top-Down Parsing -- Begin with the start symbol and apply the grammar rules until the symbols at the terminals of the tree correspond to the components of the sentence being parsed. Bottom-Up Parsing -- Begin with the sentence to be parsed and apply the grammar rules backward until a single tree whose terminals are the words of the sentence and whose top node is the start symbol has been produced. S Top down Bottom up Productions x 02 March 2021 EIE 426 -AICV 27

Parsing (cont. ) • • All Paths -- Follow all possible paths and build

Parsing (cont. ) • • All Paths -- Follow all possible paths and build all the possible intermediate components. It can be very inefficient. Best Path with Backtracking -- Follow only one path at a time, but record, at every choice point, the information that is necessary to make another choice if the chosen path fails to lead to a complete interpretation of the sentence. Two drawbacks: times wasted in saving state descriptions at each choice point and same constituent may be analyzed many times. Examples: “Have the students who missed the exam take it today. ” “Have the students who missed the exam taken it today? ” 02 March 2021 EIE 426 -AICV 28

Parsing (cont. ) • • Best Path with Patchup -- Follow only one path

Parsing (cont. ) • • Best Path with Patchup -- Follow only one path at a time, but when an error is detected, explicitly shuffle around the components that have already been formed. Disadvantage: The explicit rules for moving components from one place to another are required. Wait and See -- Follow only one path, but rather than making decisions about the function of each component as it is encountered, postpone the decision until enough information is available to make the decision correctly. Drawback: if the amount of lookahead is greater than the size of the buffer, then the interpreter will fail. 02 March 2021 EIE 426 -AICV 29

Parsing Systems Chart Parsers: They provide a way of avoiding backup by storing intermediate

Parsing Systems Chart Parsers: They provide a way of avoiding backup by storing intermediate constituents so that they can be reused along alternative parsing paths. Definite Clause Grammars: Grammars rules are written as PROLOG (a computer language for AI) clauses and the PROLOG interpreter is used to perform top-down, depth-first parsing. Augmented Transition Networks (ATNs): The parsing process is described as the transition from a start state to a final state in a transition network that corresponds to a grammar of English. 02 March 2021 EIE 426 -AICV 30

The Cocke-Younger-Kasami (CYK) Parsing Algorithm Definition: A CFG is in Chomsky Normal Form (CNF)

The Cocke-Younger-Kasami (CYK) Parsing Algorithm Definition: A CFG is in Chomsky Normal Form (CNF) if each element of P is in one of the following forms: Lemma: For any CFG, G, there exists an equivalent G’in CNF. 2021/3/2 EIE 522 31

The (CYK) Parsing Algorithm (cont. ) The CYK Table 2021/3/2 EIE 522 32

The (CYK) Parsing Algorithm (cont. ) The CYK Table 2021/3/2 EIE 522 32

The (CYK) Parsing Algorithm (cont. ) Example 1: 2021/3/2 EIE 522 33

The (CYK) Parsing Algorithm (cont. ) Example 1: 2021/3/2 EIE 522 33

The (CYK) Parsing Algorithm (cont. ) 2021/3/2 EIE 522 34

The (CYK) Parsing Algorithm (cont. ) 2021/3/2 EIE 522 34

The (CYK) Parsing Algorithm (cont. ) Rules: 2021/3/2 EIE 522 35

The (CYK) Parsing Algorithm (cont. ) Rules: 2021/3/2 EIE 522 35

The (CYK) Parsing Algorithm (cont. ) Example 2: 2021/3/2 EIE 522 36

The (CYK) Parsing Algorithm (cont. ) Example 2: 2021/3/2 EIE 522 36

The (CYK) Parsing Algorithm (cont. ) x = bbaab 2021/3/2 x=cbab EIE 522 37

The (CYK) Parsing Algorithm (cont. ) x = bbaab 2021/3/2 x=cbab EIE 522 37