Natural Language Processing NLP Prof Carolina Ruiz Computer

  • Slides: 19
Download presentation
Natural Language Processing (NLP) Prof. Carolina Ruiz Computer Science WPI

Natural Language Processing (NLP) Prof. Carolina Ruiz Computer Science WPI

References l The essence of Artificial Intelligence – – l Artificial Intelligence: Theory and

References l The essence of Artificial Intelligence – – l Artificial Intelligence: Theory and Practice – – l By T. Dean, J. Allen, and Y. Aloimonos. The Benjamin/Cummings Publishing Company, 1995 Artificial Intelligence – – l By A. Cawsey Prentice Hall Europe 1998 By P. Winston Addison Wesley, 1992 Artificial Intelligence: A Modern Approach – – By Russell and Norvig Prentice Hall, 2003 NLP - Prof. Carolina Ruiz

Communication Typical communication episode S (speaker) wants to convey P (proposition) to H (hearer)

Communication Typical communication episode S (speaker) wants to convey P (proposition) to H (hearer) using W (words in a formal or natural language) 1. Speaker l Intention: S wants H to believe P l Generation: S chooses words W l Synthesis: S utters words W 2. Hearer l Perception: H perceives words W” (ideally W” = W) l Analysis: H infers possible meanings P 1, P 2, …, Pn for W” l Disambiguation: H infers that S intended to convey Pi (ideally Pi=P) l Incorporation: H decides to believe or disbelieve Pi NLP - Prof. Carolina Ruiz

Natural Language Processing (NLP) 1. Natural Language Understanding l 2. Taking some spoken/typed sentence

Natural Language Processing (NLP) 1. Natural Language Understanding l 2. Taking some spoken/typed sentence and working out what it means Natural Language Generation l Taking some formal representation of what you want to say and working out a way to express it in a natural (human) language (e. g. , English) NLP - Prof. Carolina Ruiz

Applications of Nat. Lang. Processing l l l Machine Translation Database Access Information Retrieval

Applications of Nat. Lang. Processing l l l Machine Translation Database Access Information Retrieval – l Text Categorization – l l Sorting text into fixed topic categories Extracting data from text – l Selecting from a set of documents the ones that are relevant to a query Converting unstructured text into structure data Spoken language control systems Spelling and grammar checkers NLP - Prof. Carolina Ruiz

Natural language understanding Raw speech signal l Speech recognition Sequence of words spoken l

Natural language understanding Raw speech signal l Speech recognition Sequence of words spoken l Syntactic analysis using knowledge of the grammar Structure of the sentence l Semantic analysis using info. about meaning of words Partial representation of meaning of sentence l Pragmatic analysis using info. about context Final representation of meaning of sentence NLP - Prof. Carolina Ruiz

Natural Language Understanding l Input/Output data Processing stage Frequency spectrogram Word sequence “He loves

Natural Language Understanding l Input/Output data Processing stage Frequency spectrogram Word sequence “He loves Mary” Other data used speech recognition freq. of diff. sounds syntactic analysis grammar of language semantic analysis meanings of words pragmatics context of utterance Sentence structure He loves Mary Partial Meaning x loves(x, mary) Sentence meaning loves(john, mary) NLP - Prof. Carolina Ruiz

Speech Recognition (1 of 3) Input Analog Signal (microphone records voice) Freq. spectrogram (e.

Speech Recognition (1 of 3) Input Analog Signal (microphone records voice) Freq. spectrogram (e. g. , Fourier transform) Hz time NLP - Prof. Carolina Ruiz

Speech Recognition (2 of 3) l Frequency spectrogram – Basic sounds in the signal

Speech Recognition (2 of 3) l Frequency spectrogram – Basic sounds in the signal (40 -50 phonemes) (e. g. , “a” in “cat”) l Template matching against a database of phonemes – – Using dynamic time warping (speech speed) Constructing words from phonemes (e. g. , “th”+”i”+”ng”=thing) l l Unreliable/probabilistic phonemes (e. g. , “th” 50%, “f” 30%, …) Non-unique pronunciations (e. g. , tomato), statistics of transitions phonemes/words (hidden Markov models) Words NLP - Prof. Carolina Ruiz

Speech Recognition - Complications l No simple mapping between sounds and words – Variance

Speech Recognition - Complications l No simple mapping between sounds and words – Variance in pronunciation due to gender, dialect, … l – Same sound corresponding to diff. words l – e. g. , bear, bare Finding gaps between words l l – Restriction to handle just one speaker “how to recognize speech” “how to wreck a nice beach” Noise NLP - Prof. Carolina Ruiz

Syntactic Analysis l Rules of syntax (grammar) specify the possible organization of words in

Syntactic Analysis l Rules of syntax (grammar) specify the possible organization of words in sentences and allows us to determine sentence’s structure(s) – “John saw Mary with a telescope” l l l John saw (Mary with a telescope) John (saw Mary with a telescope) Parsing: given a sentence and a grammar – Checks that the sentence is correct according with the grammar and if so returns a parse tree representing the structure of the sentence NLP - Prof. Carolina Ruiz

Syntactic Analysis - Grammar l l l l sentence -> noun_phrase, verb_phrase noun_phrase ->

Syntactic Analysis - Grammar l l l l sentence -> noun_phrase, verb_phrase noun_phrase -> proper_noun_phrase -> determiner, noun verb_phrase -> verb, noun_phrase proper_noun -> [mary] noun -> [apple] verb -> [ate] determiner -> [the] NLP - Prof. Carolina Ruiz

Syntactic Analysis - Parsing sentence noun_phrase proper_noun verb_phrase verb noun_phrase determiner “Mary” “ate” “the”

Syntactic Analysis - Parsing sentence noun_phrase proper_noun verb_phrase verb noun_phrase determiner “Mary” “ate” “the” noun “apple” NLP - Prof. Carolina Ruiz

Syntactic Analysis – Complications (1) l Number (singular vs. plural) and gender – –

Syntactic Analysis – Complications (1) l Number (singular vs. plural) and gender – – – l Adjective – – – l sentence-> noun_phrase(n), verb_phrase(n) proper_noun(s) -> [mary] noun(p) -> [apples] noun_phrase-> determiner, adjectives, noun adjectives-> adjective, adjectives adjective->[ferocious] Adverbs, … NLP - Prof. Carolina Ruiz

Syntactic Analysis – Complications (2) l Handling ambiguity – l Syntactic ambiguity: “fruit flies

Syntactic Analysis – Complications (2) l Handling ambiguity – l Syntactic ambiguity: “fruit flies like a banana” Having to parse syntactically incorrect sentences NLP - Prof. Carolina Ruiz

Semantic Analysis l Generates (partial) meaning/representation of the sentence from its syntactic structure(s) l

Semantic Analysis l Generates (partial) meaning/representation of the sentence from its syntactic structure(s) l Compositional semantics: meaning of the sentence from the meaning of its parts: – – l Sentence: A tall man likes Mary Representation: x man(x) & tall(x) & likes(x, mary) Grammar + Semantics – Sentence (Smeaning)-> noun_phrase(NPmeaning), verb_phrase(VPmeaning), combine(NPmeaning, VPmeaning, Smeaning) NLP - Prof. Carolina Ruiz

Semantic Analysis – Complications l Handling ambiguity – Semantic ambiguity: “I saw the prudential

Semantic Analysis – Complications l Handling ambiguity – Semantic ambiguity: “I saw the prudential building flying into Boston” NLP - Prof. Carolina Ruiz

Pragmatics l Uses context of utterance – – l Where, by who, to whom,

Pragmatics l Uses context of utterance – – l Where, by who, to whom, why, when it was said Intentions: inform, request, promise, criticize, … Handling Pronouns – “Mary eats apples. She likes them. ” l l She=“Mary”, them=“apples”. Handling ambiguity – Pragmatic ambiguity: “you’re late”: What’s the speaker’s intention: informing or criticizing? NLP - Prof. Carolina Ruiz

Natural Language Generation l l Talking back! What to say or text planning –

Natural Language Generation l l Talking back! What to say or text planning – – l How to say it – l flight(AA, london, boston, $560, 2 pm), flight(BA, london, boston, $640, 10 am), “There are two flights from London to Boston. The first one is with American Airlines, leaves at 2 pm, and costs $560 …” Speech synthesis – – Simple: Human recordings of basic templates More complex: string together phonemes in phonetic spelling of each word l Difficult due to stress, intonation, timing, liaisons between words NLP - Prof. Carolina Ruiz