Midterm Review CS 4705 Natural Language Processing
Midterm Review • Statistical v. Symbolic Processing – 80/20 Rule • Regular Expressions • Finite State Automata – Determinism v. non-determinism – (Weighted) Finite State Transducers • Morphology – – Word Classes and p. o. s. Inflectional v. Derivational Affixation, infixation, concatenation Morphotactics
– Different languages, different morphologies – Evidence from human performance • Morphological parsing – Koskenniemi’s two-level morphology – FSAs vs. FSTs – Porter stemmer • Noise channel model – Bayesian inference • Spelling correction – Bayesian approach
• Creating and using ngram LMs – Corpora – Maximum Likelihood Estimation • Syntax – Chomsky’s view: Syntax is cognitive reality – Parse Trees • Dependency Structure – Part-of-Speech Tagging • Hand Written Rules v. Statistical v. Hybrid • Brill Tagging
– Types of Ambiguity • Context Free Grammars – Top-down v. Bottom-up Derivations • Left Corners – Grammar Equivalence – Normal Forms (CNF) • Probabilistic Parsing – CYK parser – Derivational Probability – Lexicalization
• Machine Learning – Dependent v. Independent variables – Training v. Development Test v. Test sets – Feature Vectors – Metrics • Accuracy • Precision, Recall, F-Measure – Gold Standards