Introduction to NLP What is Natural Language Processing

  • Slides: 21
Download presentation
Introduction to NLP What is Natural Language Processing? Many slides reused from Dan Jurafsky/Christopher

Introduction to NLP What is Natural Language Processing? Many slides reused from Dan Jurafsky/Christopher Manning (Stanford)

Question Answering: IBM’s Watson • Won Jeopardy on February 16, 2011! WILLIAM WILKINSON’S “AN

Question Answering: IBM’s Watson • Won Jeopardy on February 16, 2011! WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL 2 Bram Stoker

Information Extraction Event: Curriculum mtg Date: Aug. 23, 2017 Subject: curriculum meeting Start: 10:

Information Extraction Event: Curriculum mtg Date: Aug. 23, 2017 Subject: curriculum meeting Start: 10: 00 am Date: Aug. 22, 2017 End: 11: 30 am Where: JBHT 532 To: Susan Gauch Hi Susan, we’ve now scheduled the curriculum meeting. It will be in JBHT 532 tomorrow from 10: 00 -11: 30. -Dave Create new Calendar entry 3

Information Extraction & Sentiment Analysis Attributes: zoom affordability size and weight flash ease of

Information Extraction & Sentiment Analysis Attributes: zoom affordability size and weight flash ease of use Size and weight ✓ • nice and compact to carry! • since the camera is small and light, I won't need to carry ✓ around those heavy, bulky professional cameras either! ✗ • the camera feels flimsy, is plastic and very light in weight you have to be very delicate in the handling of this camera 4

Machine Translation • Fully automatic • Helping human translators Enter Source Text: � 不�

Machine Translation • Fully automatic • Helping human translators Enter Source Text: � 不� 是 一 个 �� 的 ��. Translation from Stanford’s Phrasal: This is only a matter of time. 5

Making progress on this problem… • The task is difficult! What tools do we

Making progress on this problem… • The task is difficult! What tools do we need? • Knowledge about language • Knowledge about the world • A way to combine knowledge sources • How we generally do this: • probabilistic models built from language data • P(“maison” “house”) high • P(“L’avocat général” “the general avocado”) low • Luckily, rough text features can often do half the job.

Ambiguity makes NLP hard: “Crash blossoms” Violinist Linked to JAL Crash Blossoms Teacher Strikes

Ambiguity makes NLP hard: “Crash blossoms” Violinist Linked to JAL Crash Blossoms Teacher Strikes Idle Kids Red Tape Holds Up New Bridges Hospitals Are Sued by 7 Foot Doctors Juvenile Court to Try Shooting Defendant Local High School Dropouts Cut in Half 100 % REA L

A few of the 83+ parses for The post office will hold out discounts

A few of the 83+ parses for The post office will hold out discounts and service concessions as incentives. [Shortened WSJ sentence. ] • S NP The post office Aux will VP V NP hold out NP discounts Conj and NP N service N concessions PP as incentives 8

 • S NP The post office Aux will VP V hold out PP

• S NP The post office Aux will VP V hold out PP NP NP Conj discounts and NP as incentives service concessions 9

1 • S NP The post office Aux VP Conj VP will V NP

1 • S NP The post office Aux VP Conj VP will V NP hold out discounts and VP V service NP concessions PP as incentives

1 • S NP The post office Aux VP Conj VP will V NP

1 • S NP The post office Aux VP Conj VP will V NP hold out discounts and VP V service NP concessions PP as incentives

 • S NP The post office Aux will VP V hold PP P

• S NP The post office Aux will VP V hold PP P out PP NP NP Conj discounts and as incentives NP service concessions 14

1 • S NP VP Conj VP The post office will hold V NP

1 • S NP VP Conj VP The post office will hold V NP out discounts and VP V service NP concessions PP as incentives

Famous Ambiguity Examples • Time flies like an arrow • Translation English -> Russian

Famous Ambiguity Examples • Time flies like an arrow • Translation English -> Russian -> English: • The spirit is willing but the flesh is weak • Became “The vodka is good but the meat is rotten” • Punctuation matters • Czarina saved a man’s life by changing: • Pardon impossible, to be sent to Siberia • Became “Pardon, impossible to be sent to Siberia” 14

Where do problems come in? Syntax • Part of speech ambiguities • Attachment ambiguities

Where do problems come in? Syntax • Part of speech ambiguities • Attachment ambiguities Semantics • Word sense ambiguities • (Semantic interpretation and scope ambigui- ties) 15

Why else is natural language understanding difficult? non-standard English segmentation issues Great job @justinbieber!

Why else is natural language understanding difficult? non-standard English segmentation issues Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & yourself should never give up either♥ the New York-New Haven Railroad neologisms unfriend Retweet bromance world knowledge Mary and Sue are sisters. Mary and Sue are mothers. But that’s what makes it fun! idioms dark horse get cold feet lose face throw in the towel tricky entity names Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene …

Language Technology making good progress still really hard Sentiment analysis mostly solved Best roast

Language Technology making good progress still really hard Sentiment analysis mostly solved Best roast chicken in San Francisco! Spam detection ✓ ✗ Let’s go to Agra! Buy V 1 AGRA … Part-of-speech (POS) tagging ADJ NOUN VERB ADV Colorless green ideas sleep furiously. Named entity recognition (NER) PERSON ORG Question answering (QA) The waiter ignored us for 20 minutes. LOC Einstein met with UN officials in Princeton Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? Coreference resolution Carter told Mubarak he shouldn’t run again. Word sense disambiguation (WSD) I need new batteries for my mouse. Paraphrase XYZ acquired ABC yesterday ABC has been taken over by XYZ Summarization Parsing I can see Alcatraz from the window! Machine translation (MT) Dialog 第 13届上海国��影�开幕 … The 13 th Shanghai International Film Festival… Information extraction (IE) You’re invited to our dinner party, Friday May 27 at 8: 30 The Dow Jones is up The S&P 500 jumped Housing prices rose Party May 27 add Economy is good Where is Citizen Kane playing in SF? Castro Theatre at 7: 30. Do you want a ticket?

Statistical NLP methods • Involve deriving numerical data from text • Are usually but

Statistical NLP methods • Involve deriving numerical data from text • Are usually but not always probabilistic • Many techniques are used: • – n-grams, history-based models, decision trees / decision lists, memory-based learning, loglinear mod- els, HMMs, neural networks, vector spaces, graphi- cal models, … 18

This class • Teaches key theory and methods for statistical NLP: • • Probability

This class • Teaches key theory and methods for statistical NLP: • • Probability and information theory classifiers N-gram language modeling Statistical Parsing Inverted index, tf-idf, vector models of meaning • For practical, robust real-world applications • • Information extraction Spelling correction Information retrieval Sentiment analysis

Skills you’ll need • • • Simple linear algebra (vectors, matrices) Basic probability theory

Skills you’ll need • • • Simple linear algebra (vectors, matrices) Basic probability theory C++ or Java or Python programming Knowledge of data structures User-level knowledge of linux

Introduction to NLP What is Natural Language Processing?

Introduction to NLP What is Natural Language Processing?