Word classes and part of speech tagging Chapter
- Slides: 18
Word classes and part of speech tagging Chapter 5
Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches 1: rule-based tagging Automatic approaches 2: stochastic tagging On Part 2: finish stochastic tagging, and continue on to: evaluation Slide 1
Definition “The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) WORDS the girl kissed the boy on the cheek TAGS N V P DET Slide 2
An Example WORD the girl kissed the boy on the cheek LEMMA the girl kiss the boy on the cheek TAG +DET +NOUN +VPAST +DET +NOUN +PREP +DET +NOUN Slide 3
Motivation Speech synthesis — pronunciation Speech recognition — class-based N-grams Information retrieval — stemming, selection high-content words Word-sense disambiguation Corpus analysis of language & lexicography Slide 4
Word Classes Basic word classes: Noun, Verb, Adjective, Adverb, Preposition, … Open vs. Closed classes Open: Nouns, Verbs, Adjectives, Adverbs. Why “open”? Closed: determiners: a, an, the pronouns: she, I prepositions: on, under, over, near, by, … Slide 5
Open Class Words Every known human language has nouns and verbs Nouns: people, places, things Classes of nouns proper vs. common count vs. mass Verbs: actions and processes Adjectives: properties, qualities Adverbs: hodgepodge! Unfortunately, John walked home extremely slowly yesterday Numerals: one, two, three, third, … Slide 6
Closed Class Words Differ more from language to language than open class words Examples: prepositions: on, under, over, … particles: up, down, off, … determiners: a, an, the, … pronouns: she, who, I, . . conjunctions: and, but, or, … auxiliary verbs: can, may should, … Slide 7
Word Classes: Tag Sets • Vary in number of tags: a dozen to over 200 • Size of tag sets depends on language, objectives and purpose – Some tagging approaches (e. g. , constraint grammar based) make fewer distinctions e. g. , conflating prepositions, conjunctions, particles – Simple morphology = more ambiguity = fewer tags Slide 8
Word Classes: Tag set example PRP$ Slide 9
Example of Penn Treebank Tagging of Brown Corpus Sentence The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS. /. VB DT NN. Book that flight. VBZ DT NN VB NN ? Does that flight serve dinner ? See http: //www. infogistics. com/posdemo. htm Buffalo buffalo Buffalo buffalo Slide 10
The Problem Words often have more than one word class: this This is a nice day = PRP This day is nice = DT You can go this far = RB Slide 11
Word Class Ambiguity (in the Brown Corpus) Unambiguous (1 tag): 35, 340 Ambiguous (2 -7 tags): 4, 100 2 tags 3, 760 3 tags 264 4 tags 61 5 tags 12 6 tags 2 7 tags 1 (Derose, 1988) Slide 12
Part-of-Speech Tagging • Rule-Based Tagger: ENGTWOL (ENGlish TWO Level analysis) • Stochastic Tagger: HMM-based • Transformation-Based Tagger (Brill) (we won’t cover this) Slide 13
Rule-Based Tagging • Basic Idea: – Assign all possible tags to words – Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. – Typically more than 1000 hand-written rules Slide 14
Sample ENGTWOL Lexicon Demo: http: //www 2. lingsoft. fi/cgi-bin/engtwol Slide 15
Stage 1 of ENGTWOL Tagging First Stage: Run words through a morphological analyzer to get all parts of speech. Example: Pavlov had shown that salivation … Pavlov had shown that salivation PAVLOV N NOM SG PROPER HAVE V PAST VFIN SVO HAVE PCP 2 SVO SHOW PCP 2 SVOO SV ADV PRON DEM SG DET CENTRAL DEM SG CS N NOM SG Slide 16
Stage 2 of ENGTWOL Tagging Second Stage: Apply constraints. Constraints used in negative way. Example: Adverbial “that” rule Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV Slide 17
- Pos taging
- Internal forwarding and register tagging
- Cross-lingual name tagging and linking for 282 languages
- Classe e subclasse de palavras 7. ano
- Pre ap classes vs regular classes
- Proudly adverb
- Pmc style checker
- Geotagging in panchayat
- Pendekatan pemulihan
- Data tagging software
- Information centric security
- Bhuvan hfa geotagging app
- Raven tools url builder
- Apa tujuan pemberian tanda
- Unscramble tagging
- Tagging
- Unsupervised pos tagging
- Vlan tagging
- Medical word parts