Dan Jurafsky Natural Language Processing NLP Lecture No
- Slides: 28
Dan Jurafsky Natural Language Processing (NLP) Lecture No 4 Institute of Southern Punjab Multan Department of Computer Science
Dan Jurafsky Basic Text Processing Regular Expression
Dan Jurafsky Spoken input For speech understanding Sequence of words Phonological / morphological analyser SYNTACTIC COMPONENT Syntactic structure (parse tree) Basic Process of NLU Phonological & morphological rules Grammatical Knowledge SEMANTIC INTERPRETER Semantic rules, Lexical semantics CONTEXTUAL REASONER Pragmatic & World Knowledge Logical form Meaning Representation Indicating relns (e. g. , mod) between words Thematic Roles Selectional restrictions
Dan Jurafsky NLP Representations • State Machines • FSAs, FSTs, HMMs, ATNs, RTNs • Rule Systems • CFGs, Unification Grammars, Probabilistic CFGs • Logic-based Formalisms • 1 st Order Predicate Calculus, Temporal and other Higher Order Logics • Models of Uncertainty • Bayesian Probability Theory
Dan Jurafsky Regular Expressions • Can be viewed as a way to specify: • Search patterns over text string • Design of a particular kind of machine, called a Finite State Automaton (FSA) • These are really equivalent 5
Dan Jurafsky Regular Expressions • Regular Expression: Formula in algebraic notation for specifying a set of strings • String: Any sequence of alphanumeric characters • Letters, numbers, spaces, tabs, punctuation marks • Regular Expression Search 6 • Pattern: specifying the set of strings we want to search for • Corpus: the texts we want to search through
Dan Jurafsky Regular Expressions The “foundational” operations Concatenation Disjunction abc a|b ab (a|bb)d ad bbd Kleene star a* ε a aa aaa. . . c(a|bb)* ca cbba Regular expressions / Finite-state automata are “closed under these operations” Pattern Matches The empty string 7
Dan Jurafsky Practical Applications of Reg. Ex’s • Web search • Word processing, find, substitute • Validate fields in a database (dates, email addr, URLs) • Searching corpus for linguistic patterns – and gathering stats. . .
Dan Jurafsky Regular expressions • A formal language for specifying text strings • How can we search for any of these? • • woodchucks Woodchucks
Dan Jurafsky Regular Expressions: Disjunctions • Letters inside square brackets [] Pattern Matches [w. W]oodchuck Woodchuck, woodchuck [1234567890] Any digit • Ranges [A-Z] Pattern Matches [A-Z] An upper case letter Drenched Blossoms [a-z] A lower case letter my beans were impatient [0 -9] A single digit Chapter 1: Down the Rabbit Hole
Dan Jurafsky Regular Expressions: Negation in Disjunction • Negations [^Ss] • Carat means negation only when first in [] Pattern Matches [^A-Z] Not an upper case letter Oyfn pripetchik [^Ss] Neither ‘S’ nor ‘s’ I have no exquisite reason” [^e^] Neither e nor ^ Look here a^b The pattern a carat b Look up a^b now
Dan Jurafsky Regular Expressions: More Disjunction • Woodchucks is another name for groundhog! • The pipe | for disjunction Pattern Matches groundhog|woodchuck yours|mine yours mine a|b|c = [abc] [g. G]roundhog|[Ww]oodchuck Photo D. Fletcher
Dan Jurafsky Regular Expressions: ? * Pattern Matches colou? r Optional previous char color oo*h! 0 or more of previous char oh! oooh! ooooh! o+h! 1 or more of previous char oh! oooh! ooooh! + . colour baa+ baaaaa beg. n begin begun beg 3 n Stephen C Kleene *, Kleene +
Dan Jurafsky Regular Expressions: Anchors ^ $ Pattern Matches ^[A-Z] Palo Alto ^[^A-Za-z] 1 . $ The end? The end! “Hello”
Dan Jurafsky Some Examples
Dan Jurafsky RE Description Uses? /a*/ Zero or more a’s Optional doubled modifiers /a+/ One or more a’s Non-optional. . . /a? / Zero or one a’s Optional. . . /cat|dog/ ‘cat’ or ‘dog’ Words modifying pets /^cat. $/ A line that contains only cat. ^anchors beginning, $ anchors end of line. ? ? /bunB/ Beginnings of longer strings Words prefixed by ‘un’ /pupp(y|ies)/ Morphological variants of ‘puppy’ / (. +)ier and 1 ier / happier and happier, fuzzier and fuzzier
Dan Jurafsky Example • Find me all instances of the word “the” in a text. the Misses capitalized examples [t. T]he Incorrectly returns other or theology [^a-z. A-Z][t. T]he[^a-z. A-Z]
Dan Jurafsky Optionality and Repetition • /[Ww]oodchucks? / matches woodchucks, Woodchucks, woodchuck, Woodchuck • /colou? r/ matches color or colour • /he{3}/ matches heee • /(he){3}/ matches hehehe • /(he){3, } matches a sequence of at least 3 he’s 18
Dan Jurafsky Operator Precedence Hierarchy 1. Parentheses 2. Counters 3. Sequence of Anchors 4. Disjunction | Examples /moo+/ /try|ies/ 19 /and|or/ () * + ? {} the ^my end$
Dan Jurafsky A Simple Exercise • Write a regular expression to find all instances of the determiner “the”: /the/ /[t. T]he/ /b[t. T]heb/ /(^|[^a-z. A-Z][t. T]he[^a-z. A-Z]/ The recent attempt by the police to retain their current rates of pay has not gathered much favor with the southern factions.
Dan Jurafsky A Simple Exercise • Write a regular expression to find all instances of the determiner “the”: /the/ /[t. T]he/ /b[t. T]heb/ The recent attempt by the police to retain their current rates of pay has not gathered much favor with southern factions.
Dan Jurafsky A Simple Exercise • Write a regular expression to find all instances of the determiner “the”: /the/ /[t. T]he/ /b[t. T]heb/ /(^|[^a-z. A-Z][t. T]he[^a-z. A-Z]/ The recent attempt by the police to retain their current rates of pay has not gathered much favor with the southern factions.
Dan Jurafsky A Simple Exercise • Write a regular expression to find all instances of the determiner “the”: /the/ /[t. T]he/ /b[t. T]heb/ /(^|[^a-z. A-Z][t. T]he[^a-z. A-Z]/ The recent attempt by the police to retain their current rates of pay has not gathered much favor with the southern factions.
Dan Jurafsky A Simple Exercise • Write a regular expression to find all instances of the determiner “the”: /the/ /[t. T]he/ /b[t. T]heb/ /(^|[^a-z. A-Z][t. T]he[^a-z. A-Z]/ The recent attempt by the police to retain their current rates of pay has not gathered much favor with the southern factions.
Dan Jurafsky A Simple Exercise • Write a regular expression to find all instances of the determiner “the”: /the/ /[t. T]he/ /b[t. T]heb/ /(^|[^a-z. A-Z])[t. T]he[^a-z. A-Z]/ The recent attempt by the police to retain their current rates of pay has not gathered much favor with the southern factions.
Dan Jurafsky Errors • The process we just went through was based on fixing two kinds of errors • Matching strings that we should not have matched (there, then, other) • False positives (Type I) • Not matching things that we should have matched (The) • False negatives (Type II)
Dan Jurafsky Errors cont. • In NLP we are always dealing with these kinds of errors. • Reducing the error rate for an application often involves two antagonistic efforts: • Increasing accuracy or precision (minimizing false positives) • Increasing coverage or recall (minimizing false negatives).
Dan Jurafsky Summary • Regular expressions play a surprisingly large role • Sophisticated sequences of regular expressions are often the first model for any text processing text • For many hard tasks, we use machine learning classifiers • But regular expressions are used as features in the classifiers • Can be very useful in capturing generalizations 28
- Natural language processing nlp - theory lecture
- Natural language processing lecture notes
- Nlp dan jurafsky
- Dan jurafsky nlp slides
- Nlp lecture notes
- Natural language processing lecture notes
- Natural language processing lecture notes
- Natural language processing lecture notes
- Nlp lecture notes
- Jurafsky martin
- Natural language processing vietnamese
- Probabilistic model natural language processing
- Markov chain natural language processing
- Language
- Pengertian natural language processing
- Buy nlu
- Language
- Natural language processing fields
- Statistical nlp
- Façade michael mateas
- Foundations of statistical natural language processing
- Junghoo cho ucla
- History of prolog
- Rada mihalcea
- Pengertian natural language
- Natural language processing
- Natural language processing
- Natural language processing
- Machine translation in natural language processing