Natural language processing Dr Waheed ur Rehman Language

  • Slides: 27
Download presentation
Natural language processing Dr. Waheed ur Rehman Language is the dress of thought 2/28/2021

Natural language processing Dr. Waheed ur Rehman Language is the dress of thought 2/28/2021 1

NLP • NLP refers to AI methods of communicating with a computer in natural

NLP • NLP refers to AI methods of communicating with a computer in natural language • Why NLP? – Computers are considered “unfriendly” – One has to learn special language and commands – Icons, menus and various input devices – Communicating with computers in natural language resolve all these issues 2/28/2021 2

NLP • Why NLP is categorized as AI ? – They are simply another

NLP • Why NLP is categorized as AI ? – They are simply another form of knowledge-based systems – To understand an inquiry in natural language, a computer must analyze and interpret it – It must understand the grammar and meaning of the word 2/28/2021 3

The Vocabulary of Natural Language 1. 2. 3. 4. 5. Language Linguistics Vocabulary and

The Vocabulary of Natural Language 1. 2. 3. 4. 5. Language Linguistics Vocabulary and lexicon Grammar, syntax and semantics Context and pragmatics 2/28/2021 4

The Vocabulary of Natural Language 1. Language – System for communication – Help us

The Vocabulary of Natural Language 1. Language – System for communication – Help us communicate our thoughts and feelings – Uses wide range of sounds, signs and symbols to create words, sentences and paragraphs – Whether written or spoken, language is the medium we use for expressing and organizing what we know, think and feel – Most languages are not deliberately invented or designed-evolved with time – Formal language as opposed to natural language are designed for special purposes Eg. Computer languages 2/28/2021 5

The Vocabulary of Natural Language 2. Linguistics – Linguistics is the study of how

The Vocabulary of Natural Language 2. Linguistics – Linguistics is the study of how languages are structured and used – A linguist is a person who specializes in the study of languages – It is also a person who speak and use more than one language fluently – Linguist conduct a research to organize all the symbol, words, phrases, rules and procedures of a language and show it should be used 2/28/2021 6

The Vocabulary of Natural Language 3. Vocabulary and Lexicon – A vocabulary is all

The Vocabulary of Natural Language 3. Vocabulary and Lexicon – A vocabulary is all of the words and phrases in a particular language – A linguist identify and arrange all commonly used words and phrases into a lexicon – A lexicon is nothing more than a dictionary giving the correct spelling and punctuation of the words and gives their definitions and pronunciation. – A glossary is a subset of a dictionary which defines words, terms or phrases in a special field of interests 2/28/2021 7

The Vocabulary of Natural Language 4. Grammar, Syntax and Semantic – Apart from knowing

The Vocabulary of Natural Language 4. Grammar, Syntax and Semantic – Apart from knowing the meaning of words, we should also know how to assemble them into complete thought – The system of rules for putting words together to form complete sentences and thoughts is called grammar. – Grammar is composed of syntax and semantics – Syntax is the way words are assembled to form phrases and sentences – Syntax is the method of putting words in a specific order so they will have a correct form – Semantics refers to meaning in language – Study of relationship between words – It provides with ways of analyzing and interpreting what is being said 2/28/2021 8

The Vocabulary of Natural Language 5. Context and pragmatics – – – 2/28/2021 A

The Vocabulary of Natural Language 5. Context and pragmatics – – – 2/28/2021 A sentence expresses one complete thought A paragraph conveys a particular idea A sentence in isolation might have particular understanding--- out of context Context refers to the complete idea or thought surrounding any sentence in a paragraph Together sentences make sense, alone each sentence contains only a piece of the whole No interpretation is require when looked together Context clarifies meaning by explaining the circumstances and relationships. Pragmatics refers to what people really mean by what they say or write one thing and mean another Pragmatics is the study of how language is used and how language is integrated in context. Context and pragmatics fill in the gaps often left by syntax and semantics Examples : Kim have a knife, Shall we? , what time is it? 9

How NLP programs work? • Two techniques are widely used – Key word search

How NLP programs work? • Two techniques are widely used – Key word search – Syntactical and semantic analysis • The assumption is that the input is coming from the user using keyboard. The text is stored in the input buffer, which is then processed by an NLP program to analyze and understand. 2/28/2021 10

Key word Analysis • The first NLP program use key word analysis • It

Key word Analysis • The first NLP program use key word analysis • It searches through an input sentence looking for keywords or phrases • The program is able to identify or “knows” only selected words and phrases • Once a key word or phrase is recognized, the program responds with specific “canned” response or construct a response • One important point about key word matching program is the size of their vocabularies • The vocabulary is made up of all the key words and phrases which the program can identify • E. g. ELIZA 2/28/2021 11

START Input message Accept input and store END Output suitable response Scan input search

START Input message Accept input and store END Output suitable response Scan input search for key words Develop and output a response Key word found? yes 2/28/2021 no Key word Analysis 12

Key word Analysis • Disadvantages – Vocabulary size – Usefulness is restricted – Cannot

Key word Analysis • Disadvantages – Vocabulary size – Usefulness is restricted – Cannot deal with large variation in language – Cannot comprehend the meaning of the sentence – Can only be useful in a limited domain 2/28/2021 13

Syntactical and Semantic Analysis • More sophisticated than keyword analysis • Detailed analysis of

Syntactical and Semantic Analysis • More sophisticated than keyword analysis • Detailed analysis of syntax and semantics of an input statement • The structure and meaning of the sentence is determined • Easier said than done • Words have many meanings and enormous ways to put them together • basic building blocks are as following… 2/28/2021 14

Syntactical and Semantic Analysis • Basic language units – The basic unit of the

Syntactical and Semantic Analysis • Basic language units – The basic unit of the English language is the sentence – Sentence is made up of words – Words fall into various categories such as noun, pronoun, adjective … parts of speech – A word fall in any part of speech – Some words may fall in multiple part of speech e. g. saw – Natural language processors are primarily designed to recognize complete sentences, but they must also deal with partial inputs – This phenomenon of using sentence fragments, short phrase or a single word in conversation is known as ellipsis. 2/28/2021 15

Syntactical and Semantic Analysis • Morphemes – The sentence is usually broken down into

Syntactical and Semantic Analysis • Morphemes – The sentence is usually broken down into words for the analysis – Syntactic and semantic analysis divides the input into smaller units called morphemes – A morpheme is the smallest unit of language – A morpheme may be word itself – free morpheme e. g. computer – a part of word morpheme is called bound morpheme e. g. computers = computer + s , “s” is bound morpheme which shows plurality, while computer is a root word – Bound morphemes are usually suffixes (-ing, -ed) or prefixes (un-) 2/28/2021 16

A natural language understanding system • • • Parser Lexicon Understander Knowledge base Generator

A natural language understanding system • • • Parser Lexicon Understander Knowledge base Generator 2/28/2021 17

A Natural Language Understanding System Input text string 2/28/2021 output PARSER UNDERSTANDER LEXICON KNOWLEDGE

A Natural Language Understanding System Input text string 2/28/2021 output PARSER UNDERSTANDER LEXICON KNOWLEDGE BASE GENERATOR BLOCK DIAGRAM OF NATURAL LANGUAGE UNDERSTANDING PROGRAM OF THE SYNTAX/SEMANTIC ANALYSIS TYPE 18

A natural language understanding system • Parser – A software that analyzes the input

A natural language understanding system • Parser – A software that analyzes the input sentence syntactically – Each word and its part of speech is identified – Maps the words into a parse tree – The parser identifies the noun phrase and verb phrase and further breaks them down into other elements 2/28/2021 19

A natural language understanding system • Parser (Example) – S = NP + VP

A natural language understanding system • Parser (Example) – S = NP + VP – NP = ART + ADJ + N – PP = P + ART + N – VP=V + NP – VP = V + ADJ + N + PP – Joan drove the new car to bloomingdale’s 2/28/2021 20

S Joan drove the new car to bloomingdale’s NP N VP NP V NP

S Joan drove the new car to bloomingdale’s NP N VP NP V NP PP JUAN drove Det ADJ the S = NP + VP new NP = N | Det + ADJ + N | NP + PP PP = P + N VP = V + NP Det = a | an | the 2/28/2021 N P N car to Bloomingdale’s 21

A natural language understanding system • Parser – The most popular method of parsing

A natural language understanding system • Parser – The most popular method of parsing is the augmented transition network (ATN) – Finite state machine adj ART A 1 A 2 noun A 3 The long, black car 2/28/2021 22

A natural language understanding system • The lexicon – The lexicon contains all of

A natural language understanding system • The lexicon – The lexicon contains all of the words that the program is capable of recognizing – Also contains the correct spelling, meaning and part of speech – Some parsers perform morpheme analysis – In this way, more precise meaning can be determined – Some parsers collect all variations of the root word – Overall the result is same – When a word is recognized, it is searched in lexicon for its lexical information – Ultimately the parse tree is built – Parser can also do the secondary activities such as correcting spelling mistakes with the help of lexicon 2/28/2021 23

A natural language understanding system • The Understander and Knowledge base – The understander

A natural language understanding system • The Understander and Knowledge base – The understander works in conjunction with the knowledge base – The knowledge base is the primary means of understanding what has been said – The purpose of the understander is to use the parse tree to reference the knowledge base – If the input sentence is a statement, the understander will determine meaning by looking up words and phrases in the knowledge base 2/28/2021 24

A natural language understanding system • The generator – The generator uses the understood

A natural language understanding system • The generator – The generator uses the understood input to create useable output – The output depends upon the environment or application E. g. DBMS – query 2/28/2021 25

Assignment on NLP Deadline: 17 Dec, 2019 Total marks: 25 5 marks for each

Assignment on NLP Deadline: 17 Dec, 2019 Total marks: 25 5 marks for each question A. Answer the following questions 1. What is morpheme analysis? Give examples 2. How morpheme analysis is better than word analysis? Give examples 3. What are the different types of morphemes? Explain with examples. 4. Can a word and morpheme be same? Give examples 5. Can we store all the possible variations of the root word in lexicon to avoid morpheme analysis? Will the result be different? Explain your answer. 2/28/2021 26

Total marks: 15 3 marks for each functionality Practical Assignment B. Write a computer

Total marks: 15 3 marks for each functionality Practical Assignment B. Write a computer program that accept number of sentences and that can count the followings 1. Number of words in the input text 2. Number of letters 3. Number of sentences 4. Number of punctuation marks 5. Number of sentences that are questions 2/28/2021 27