COMP 791 A Statistical Language Processing Linguistic Essentials

  • Slides: 53
Download presentation
COMP 791 A: Statistical Language Processing Linguistic Essentials Chap. 3 1

COMP 791 A: Statistical Language Processing Linguistic Essentials Chap. 3 1

Levels of study of NLP n Lexical q Possible words in a given language

Levels of study of NLP n Lexical q Possible words in a given language n n Phonetics & phonology q How words are related to sounds n n rose [roz] Parts-of-speech & Morphology q How words are constructed from basic meaning units (morphemes) n n n rose ? gellapou friend + ly --> friendly rose + ly ≠ rosely friend + s --> friends woman + s ≠ womans Phrase Structure and Syntax q How words can be ordered to form correct sentences n n ? Red the is rose / adj det verb noun The rose is red / det noun verb adj 2

Levels of study of NLP (con’t) n Semantics q What words mean (lexical semantics,

Levels of study of NLP (con’t) n Semantics q What words mean (lexical semantics, word sense disambiguation) n q How word meanings are combined into the meaning of sentences. n n n chair --> furniture / person The chair is broken. The chair is sick. Pragmatics q How language conventions affects the literal meaning (interpretation) n n q Discourse n q Do you have the time? Do you have the children? How surrounding sentences affect interpretation q The chair’s leg is broken. He went skiing last week-end. q The chair’s leg is broken. Someone placed a 500 kg package on it. World-Knowledge n How general knowledge about the world affects interpretation q The prof sent the student to see the chair because he was fed up with his behavior. q The prof sent the student to see the chair because he wanted to see him. q The prof sent the student to see the chair because he was taking in class. 3

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech &

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics q q Discourse World-Knowledge 4

Parts of Speech and Morphology n Parts of Speech (POS) q q n Morphology

Parts of Speech and Morphology n Parts of Speech (POS) q q n Morphology q q n word/lexical/syntactic/grammatical categories/tag/class Ex: noun, verb, adjectives, prepositions, … study and description of word formation in a language modification of a root form (stem) by affixes affix: prefixes, suffixes, infixes, circumfixes and exceptions… thief --> thieves chief --> chiefs Word categories are systematically related by morphological processes 5

Morphological processes n Inflection q q to indicate case, gender, number, tense, person, mood,

Morphological processes n Inflection q q to indicate case, gender, number, tense, person, mood, or voice does not change the word’s grammatical class or meaning significantly n n n Derivation q q n car --> cars talk --> talking creation of a new word may have different meaning and/or grammatical class n infect --> disinfect n grateful --> ungrateful n wide (adjective) --> widely (adverb) n teach (verb) --> teacher (noun) Compounding q q merging 2 or more words into a single one written as separate words but pronounced as a single word / denotes 1 single concept so merits an entry in lexicon n tea kettle, disk drive, mad cow disease 6

Classes of POS n Open (lexical) class q things, actions, events, … n q

Classes of POS n Open (lexical) class q things, actions, events, … n q q q n ex. cat, John, eat new words can be added easily nouns, verbs, adjectives, adverbs some languages do not have all these categories Closed (functional) class q generally function/grammatical words n q q ex. the, in, and, for relatively fixed membership prepositions, determiners, pronouns, conjunctions, particles, numerals, auxiliary verbs 7

Main POS n Open class q q n Noun – refers to entities like

Main POS n Open class q q n Noun – refers to entities like people, places, things or ideas. Adjective – describes the properties of nouns or pronouns. Verb – describes actions, activities and states. Adverb – describes a verb, an adjective or another adverb. Closed class q q Pronoun – word that take the place of a noun or other. Determiner – describes the particular reference of a noun. Preposition - expresses spatial or time relationships. … 8

Nouns (open) n Entities like people, places, things or ideas q n Typical inflections:

Nouns (open) n Entities like people, places, things or ideas q n Typical inflections: q q q n ex: dog, tree, Mary, idea number (singular, plural), gender (masculine, feminine, neuter), case (nominative, genitive, accusative, dative) Sub-categories: q q proper nouns (John) adverbial nouns (today, home) 9

Verbs (open) n Actions, activities, and states The men work in the field. The

Verbs (open) n Actions, activities, and states The men work in the field. The men are working in the field. The men are in the field. n Typical inflections: q q n tenses: present, past, future other inflection: number, person aspect: progressive, perfective voice: active, passive Sub-category: q auxiliaries (considered closed-class words) n q modal verbs (considered closed-class words) n q ex: be, do, will ex: can, should, could main verbs 10

Main verbs n Transitive q requires a direct object (found with questions: what? or

Main verbs n Transitive q requires a direct object (found with questions: what? or whom? ) n n n Intransitive q does not require a direct object. n n ? The child broke a glass. The train arrived. Some verbs can be both transitive and intransitive q q The ship sailed the seas. (transitive) The ship sails at noon. (intransitive) I met my friend at the airport. (transitive) The delegates met yesterday. (intransitive) 11

Adjectives (open) n Properties and attributes q q q long road rainy day attractive

Adjectives (open) n Properties and attributes q q q long road rainy day attractive hat Typical inflections: q n number, gender, case Sub-categories: q q comparative (richer) superlative (richest) 12

Adverbs (open) n words added to a verb, adjective, adverbs or other to expand

Adverbs (open) n words added to a verb, adjective, adverbs or other to expand its meaning q q q n You must set up the copy now. Mary walks gracefully. Sometimes I take a walk in the woods. Jack usually leaves the house at seven. I have always admired her. sub-categories: q q locative (here) degree (very) manner (slowly) temporal (late, yesterday (noun? )) 13

Closed class categories n Determiners: q n words that makes specific the denotation of

Closed class categories n Determiners: q n words that makes specific the denotation of a noun phrase n articles the hat, a hat n demonstrative this hat, that n possessive John‘s hat, my hat, her book n wh-determiner which hat, whose hat n quantifier some hat, every hat Prepositions: q words that show the relationship between certain words in a sentence n q n by, to, at, … Conjunctions: q q n The accident occurred under the bridge. words used to join other words or group of words or, when, but, and, … Auxiliary & modal verbs: q be, do, can , may, should, … 14

Closed class categories (con’t) n Particles: q q q words that are added to

Closed class categories (con’t) n Particles: q q q words that are added to main verbs to construct different verbs check+out = check out, make+up = make up Ex: q q q n She made up a story She made it up particles vs. prepositions n she <ran up> a bill / she <ran> <up> a hill Numerals: q one, third 15

Closed class categories (con’t) n Pronouns: q a word that replaces a noun or

Closed class categories (con’t) n Pronouns: q a word that replaces a noun or even another sentence n q ex: she, ourselves, mine, that subcategories: n n n Personal: q You are very nice. Possessive: q Mine is nicer. Interrogative: used to ask questions: who? , what? , which? q Who is that girl ? Demonstrative: point out definite persons, places or things: this, these, that q This is my book. q He said he was busy, but that was a lie. Relative: joins the clause which is introduced its own attachment: who, which, that q She is the girl who won the race. . 16

Other parts of speech n Interjections: q n Negatives: q n no, not Politeness

Other parts of speech n Interjections: q n Negatives: q n no, not Politeness markers: q n Ouch! Hello, bye Existential: q There are 3 students sleeping. 17

Summary n Open class q q n nouns verbs adjectives adverbs cat, spirit eat,

Summary n Open class q q n nouns verbs adjectives adverbs cat, spirit eat, cook slow, large slowly Closed class q q q q prepositions determiners pronouns conjunctions auxiliary verbs particles numerals on, under, at a, the, some she, who, I, other and, but, or can, may, should up, on, off one, two, first 18

The substitution test n Basic test to determine if 2 words belong to the

The substitution test n Basic test to determine if 2 words belong to the same POS class intelligent The sad one is in the corner. green fat … 19

POS Tagging n Automatically assign POS tags to words in a text. q q

POS Tagging n Automatically assign POS tags to words in a text. q q q Children/NOUN eat/VERB sweet/ADJECTIVE candy/NOUN The/ARTICLE children/NOUN ate/VERB the/ARTICLE cake/NOUN The/ARTICLE news/NOUN has/AUXILIARY been/MAIN VERB quite/ADVERB sad/ADJECTIVE in/PREPOSITION fact/NOUN. /PERIOD 20

Why do POS Tagging? n n n 1 st step towards NLU easier then

Why do POS Tagging? n n n 1 st step towards NLU easier then full NLU (results > 95% accuracy) Useful for: q speech recognition/ synthesis (better accuracy) n n q stemming in IR n n q how to recognize/pronounce a word CONtent /noun VS con. TENT/adj which morphological affixes the word can take adverb - ly = noun (friendly - ly = friend) Indexing in IR n pick out nouns which may be more important than other words in indexing documents 21

Tag Sets n n A tag indicates the various conventional parts of speech. Different

Tag Sets n n A tag indicates the various conventional parts of speech. Different Tag Sets have been used q q Ex. Brown Tag Set, Penn Treebank Tag Set Tag examples: n n q NP Proper noun NN Singular noun AT Article DET Determinant More on this later 22

Penn Treebank tag Set Tag Description Examples CC conjunction, coordinating and but either et

Penn Treebank tag Set Tag Description Examples CC conjunction, coordinating and but either et for less minus neither nor or plus so therefore CD numeral, cardinal mid-1890 nine-thirty forty-two one-tenth ten million 0. 5 one DT determiner all an another any both del each either every half la many much IN preposition or subordinating conjunct. astride among upon whether out inside pro despite on by throughout JJ adjective or numeral, ordinal third ill-mannered pre-war regrettable oiled calamitous first JJR adjective, comparative bleaker braver breezier briefer brighter brisker broader bumper NN noun, common, singular or mass common-carrier cabbage knuckle-duster Casino afghan shed NNP noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane NNS noun, common, plural undergraduates scotches bric-a-brac products bodyguards facets PRP pronoun, personal herself himself it itself me myself oneself ours RB adverb occasionally unabatingly maddeningly adventurously professedly RP particle aboard about across along apart around aside at away back TO "to" as preposition or infinitive marker to VB verb, base form ask assemble assess assign assume atone attention avoid bake VBD verb, past tense dipped pleaded swiped wore soaked tidied convened halted VBG verb, present participle or gerund telegraphing stirring focusing angering judging stalling lactating VBN verb, past participle imitated dilapidated aerosolized chaired languished panelized used VBP verb, present tense, not 3 rd p. singular predominate wrap resort sue twist spill cure lengthen brush VBZ verb, present tense, 3 rd p. singular bases reconstructs marks mixes displeases seals carps weaves … 23

Ambiguities in POS tagging n Children eat sweet candy / noun. Too much boiling

Ambiguities in POS tagging n Children eat sweet candy / noun. Too much boiling will candy / adjective the molasses. n Fruit flies / ? like / ? a banana. n 24

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech &

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics q q Discourse World-Knowledge 25

Syntax or Phrase Structure n Syntax q study of the regularities and constrains of

Syntax or Phrase Structure n Syntax q study of the regularities and constrains of word order and phrase structure n n the book is red vs red book is the Grammar q expresses the relations among the constituents of a sentence 26

Constituents n n also called, syntactic structures Main Constituents: q S: sentence The boy

Constituents n n also called, syntactic structures Main Constituents: q S: sentence The boy is happy. q NP: noun phrase the little boy Sam Smith I three boy from Montreal eat an apple sing q VP: verb phrase leave Boston in the morning q PP: prepositional phrase in the morning about my ticket q Adj. P: adjective phrase really funny rather clear very large q Adv. P: adverb phrases slowly really slowly 27

Sentence Moods/Types n Declarative q q n Imperative q q n Eat! S -->

Sentence Moods/Types n Declarative q q n Imperative q q n Eat! S --> VP Yes-No Question q q n Mary eats. S --> NP VP Did Mary eat? S --> Aux NP VP Wh-Question q q When did Mary eat? S --> WH-pro Aux NP VP 28

Noun Phrases n NP --> pre-modifiers head post-modifiers n head: central noun in NP

Noun Phrases n NP --> pre-modifiers head post-modifiers n head: central noun in NP n q the little boy, the boy from Montreal q determiners, cardinal, ordinal, quantifier pre-modifiers: q n n the boy, two boys, first boy, several boys n funny boy, really funny boy n flights from Montreal n gerundive (-ing) Adj. P post-modifiers: q PP q non-finite clause n n q q flights arriving from Montreal q dinner served on board, jewels stolen from the queen q flight to arrive from Montreal -ed infinitive form relative clause n flight that arrives from Montreal, girl who won the race 29

Verb Phrases n VP --> head-verb complements adjuncts n Some VPs: q q q

Verb Phrases n VP --> head-verb complements adjuncts n Some VPs: q q q Verb NP PP Verb S Verb VP eat. leave Montreal in the morning. leave in the morning. think I would like the fish. want to leave Montreal in the morning. 30

Subcategorisation frames n Some verbs can take complements that others cannot I want to

Subcategorisation frames n Some verbs can take complements that others cannot I want to fly. n * I find to fly. Verbs are subcategorized according to the complements they can take --> subcategorisation frames q q traditionally: transitive vs intransitive nowadays: up to 100 subcategories / frames 31

Prepositional phrases n PP --> Preposition NP q q from Japan inside my blue

Prepositional phrases n PP --> Preposition NP q q from Japan inside my blue bag 32

Adjective Phrases n Adj. P --> Adj Modifiers q tall q very tall q

Adjective Phrases n Adj. P --> Adj Modifiers q tall q very tall q taller than Mary 33

Adverb Phrases n Adv. P --> Adv Modifiers q affirmatively q very graciously q

Adverb Phrases n Adv. P --> Adv Modifiers q affirmatively q very graciously q rather secretively 34

Context Free Grammars n set of non-terminal symbols q q n set of terminal

Context Free Grammars n set of non-terminal symbols q q n set of terminal symbols q lexicon of words & punctuation cat, mouse, nurses, eat, . . . q sentence S q n n constituents & parts-of-speech S, NP, VP, PP, Det, N, V, . . . a non-terminal designated as the starting symbol a set of re-write rules q q q having a single non-terminal on the LHS and one or more terminal or non-terminal in the RHS S --> NP VP NP --> Pro | PN | Det Nominal 35

A simple context-free grammar n n n n S --> NP VP NP -->

A simple context-free grammar n n n n S --> NP VP NP --> AT NNS NP --> AT NN NP --> NP PP VP --> VBD NP P --> IN NP n n n n n The Grammar NNS --> children NNS --> students NNS --> mountains VBD --> slept VBD --> ate VBD --> saw AT --> the IN --> in IN --> of NN --> cake The Lexicon 36

A parse tree n a tree representation of the application of the grammar to

A parse tree n a tree representation of the application of the grammar to a specific sentence. S NP AT The VP NNS children VBD ate NP AT the NN cake 37

Stochastic Grammars n Grammars obtained by adding probabilities to “algebraic” (i. e. , non-probabilistic)

Stochastic Grammars n Grammars obtained by adding probabilities to “algebraic” (i. e. , non-probabilistic) grammars. n n n n 1 S --> NP VP 0. 4 NP --> AT NNS 0. 4 NP --> AT NN 0. 2 NP --> NP PP 0. 1 VP --> VBD 0. 8 VP --> VBD NP 1 P --> IN NP 38

Syntactic Dependencies n Local dependency q dependency between two words expressed within the same

Syntactic Dependencies n Local dependency q dependency between two words expressed within the same syntactic rule. n q n The 3/plural books/plural. n-grams models this very well. Non-local dependency q two words can be syntactically dependent even though they occur far apart in a sentence n Ex: subject-verb agreement n The children who found a wallet on the street yesterday while walking their dog were given a reward. q challenge for certain statistical NLP approaches (ex. ngrams) that model local dependencies. 39

Difficulties in parsing n Attachment ambiguity q The children ate the cake with a

Difficulties in parsing n Attachment ambiguity q The children ate the cake with a spoon. n n The children ate (the cake with a spoon). ? ? The children (ate with a spoon). ? ? 40

Other difficulties n NP bracketing plastic cat food can cover --> ? (plastic cat)

Other difficulties n NP bracketing plastic cat food can cover --> ? (plastic cat) (food can) cover q --> ? plastic (cat food can) cover --> ? (plastic cat food) (can cover) n Conjunctions and appositives q Maddy, my dog, and Samy --> ? (Maddy, my dog), and (Samy) --> ? (Maddy), (my dog), and (Samy) 41

Another Ambiguity: Garden-Path Sentences n n well-studied class of syntactic ambiguity sentence is re-analysed

Another Ambiguity: Garden-Path Sentences n n well-studied class of syntactic ambiguity sentence is re-analysed when the last word in encountered humans have difficulty analysing such sentences Example: The horse raced past the barn fell. (the horse that was raced past the barn) fell. 42

Garden Path: Wrong Parse [S [NP The horse] [VP raced past the barn]]fell dt:

Garden Path: Wrong Parse [S [NP The horse] [VP raced past the barn]]fell dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase 43

Garden Path: Right Parse [S [NP The horse [PAP raced past the barn]][VP fell]]

Garden Path: Right Parse [S [NP The horse [PAP raced past the barn]][VP fell]] dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase PAP: passive phrase 44

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech &

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics q q Discourse World-Knowledge 45

Semantics n n the study of the meaning of words, constructions, and utterances can

Semantics n n the study of the meaning of words, constructions, and utterances can be divided into two parts: q lexical semantics n q meaning of words compositional semantics n n Meaning of sentences and discourse the meaning of the whole often differs from the meaning of the parts. 46

Lexical Semantics n Meaning of individual words q I went to the bank of

Lexical Semantics n Meaning of individual words q I went to the bank of Montreal and deposited 50$. I went to the bank of the river and dangled my feet. q Word Sense Disambiguation q Determining which sense of a word is used in a specific sentence Semantic relations between words: n hypernymy, hyponymy, synonymy, antonymy, meronymy, holonymy, polysemy, homonymy and homophony. 47

Meaning of sentences n The cat eats the mouse = The mouse is eaten

Meaning of sentences n The cat eats the mouse = The mouse is eaten by the cat. n Goal: q q n Some characteristics of a sentence that influence semantic interpretation: q q n built a representation of the meaning of the sentence attach semantic roles to constituents Type Polarity Tense Voice declarative, interrogative, imperative, exclamatory positive, negative past, present, future Active, passive Some semantic roles (different from syntactic roles): q q q q Agent the doer of a volitional act Patient the thing that is affected by an act Recipient the receiver of an object Instrument the instrument used to perform an act. Time the time the act is performed. Location the location of an act or object. … 48

Semantic Roles n Ex: q q q q n John. AGENT hit Peter. PATIENT

Semantic Roles n Ex: q q q q n John. AGENT hit Peter. PATIENT with a ball. INSTRUMENT. I ate spaghetti with meatballs. INGREDIENT_OF_SPAGUETTI I ate spaghetti with salad. SIDE DISH_OF_SPAGUETTI I ate spaghetti with a fork. INSTRUMENT I ate spaghetti with a friend. ACOMPANIER_OF_EATING Important for machine translation… q q q I AGENT: PERSON_LACKING_SOMEONE miss you PATIENT: PERSON_MISSED ? Je PATIENT: PERSON_MISSED te. AGENT: PERSON_LACKING_SOMEONE manque. Tu PATIENT: PERSON_MISSED me AGENT: PERSON_LACKING_SOMEONE manques. 49

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech &

Levels of study of NLP n n n Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics q q Discourse World-Knowledge 50

Pragmatics n n n goes beyond the study of the meaning of a sentence

Pragmatics n n n goes beyond the study of the meaning of a sentence tries to explain what the speaker is really expressing understanding how people use language socially (ex. figures of speech, speech acts, discourse analysis, …) q Ex: Could you spare some change? 51

Discourse Analysis n n In logics: A B C C B A Not in

Discourse Analysis n n In logics: A B C C B A Not in NL: q q n John visited Paris. He bought Mary some expensive cologne. Then he flew home. He went to Kmart. He bought some underwear. John visited Paris. Then he flew home. He went to Kmart. He bought Mary some expensive cologne. He bought some underwear. NL Text must be coherent q ? Bill went to see his mother. The trunk is what makes the bonsai, it gives it both its grace and power. 52

Using world knowledge n Using our general knowledge of the world to interpret a

Using world knowledge n Using our general knowledge of the world to interpret a sentence/discourse n Ex: A men was killed yesterday because a jealous husband returned home earlier then usual. n Ex: Silence of the lambs… 53