Midterm Review CS 4705 Natural Language Processing Midterm

  • Slides: 5
Download presentation
Midterm Review CS 4705 Natural Language Processing

Midterm Review CS 4705 Natural Language Processing

Midterm Review • Statistical v. Symbolic Processing – 80/20 Rule • Regular Expressions •

Midterm Review • Statistical v. Symbolic Processing – 80/20 Rule • Regular Expressions • Finite State Automata – Determinism v. non-determinism – (Weighted) Finite State Transducers • Morphology – – Word Classes Inflectional v. Derivational Affixation, infixation, concatenation Morphotactics

 • Morphological parsing – Koskenniemi’s two-level morphology – Porter stemmer • Minimum Edit

• Morphological parsing – Koskenniemi’s two-level morphology – Porter stemmer • Minimum Edit Distance (Levenshtein) • N-grams – Markov assumption – Chain Rule – Language Modeling • Simple, Adaptive, Class-based (syntax-based), bursty – Smoothing • Add-one, Witten-Bell, Good-Turing – Back-off – Perplexity, Entropy • Maximum Likelihood Estimation

 • Syntax – Chomsky’s view: Syntax is cognitive reality – Parse Trees •

• Syntax – Chomsky’s view: Syntax is cognitive reality – Parse Trees • Dependency Structure – Part-of-Speech Tagging • Hand Written Rules v. Statistical v. Hybrid • Brill Tagging – Types of Ambiguity • Context Free Grammars – Top-down v. Bottom-up Derivations • Left Corners – Grammar Equivalence – Normal Forms (CNF)

 • Probabilistic Parsing – – – (p)CYK, Earley Parsing Derivational Probability Lexicalization Classification

• Probabilistic Parsing – – – (p)CYK, Earley Parsing Derivational Probability Lexicalization Classification Supertagging • Machine Learning – – Dependent v. Independent variables Training v. Development Test v. Test sets Feature Vectors Metrics • Accuracy • Precision, Recall, F-Measure – Gold Standards