Midterm Review CS 4705 Natural Language Processing Midterm

  • Slides: 7
Download presentation
Midterm Review CS 4705 Natural Language Processing

Midterm Review CS 4705 Natural Language Processing

Midterm Review • Statistical v. Symbolic Processing – 80/20 Rule • Regular Expressions •

Midterm Review • Statistical v. Symbolic Processing – 80/20 Rule • Regular Expressions • Finite State Automata – Determinism v. non-determinism – (Weighted) Finite State Transducers • Morphology – – Word Classes and p. o. s. Inflectional v. Derivational Affixation, infixation, concatenation Morphotactics

– Different languages, different morphologies – Evidence from human performance • Morphological parsing –

– Different languages, different morphologies – Evidence from human performance • Morphological parsing – Koskenniemi’s two-level morphology – FSAs vs. FSTs – Porter stemmer • Noise channel model – Bayesian inference • Spelling correction – Bayesian approach

– Minimum Edit Distance (Levenshtein distance) • Dynamic Programming • N-grams – Markov assumption

– Minimum Edit Distance (Levenshtein distance) • Dynamic Programming • N-grams – Markov assumption – Chain Rule – Language Modeling • Simple, Adaptive, Class-based (syntax-based) • Smoothing – Add-one, Witten-Bell, Good-Turing • Back-off models

 • Creating and using ngram LMs – Corpora – Maximum Likelihood Estimation •

• Creating and using ngram LMs – Corpora – Maximum Likelihood Estimation • Syntax – Chomsky’s view: Syntax is cognitive reality – Parse Trees • Dependency Structure – Part-of-Speech Tagging • Hand Written Rules v. Statistical v. Hybrid • Brill Tagging

– Types of Ambiguity • Context Free Grammars – Top-down v. Bottom-up Derivations •

– Types of Ambiguity • Context Free Grammars – Top-down v. Bottom-up Derivations • Left Corners – Grammar Equivalence – Normal Forms (CNF) • Probabilistic Parsing – CYK parser – Derivational Probability – Lexicalization

 • Machine Learning – Dependent v. Independent variables – Training v. Development Test

• Machine Learning – Dependent v. Independent variables – Training v. Development Test v. Test sets – Feature Vectors – Metrics • Accuracy • Precision, Recall, F-Measure – Gold Standards