REGULAR EXPRESSION BASIC REGULAR EXPRESSION Name Kainat Abro
REGULAR EXPRESSION & BASIC REGULAR EXPRESSION Name: Kainat Abro Roll no : 19 MESE-01 Teacher : Dr. Qasim Ali
Regular Languages • There are several formalisms for specifying tokens • Regular languages are the most popular • • • Simple and useful theory Easy to understand Efficient implementations
Languages Defined as : Let Σ be a set of characters. A language over Σ is a set of strings of characters drawn From Σ
Examples of Languages • Alphabet = English characters • Alphabet = ASCII • Language = English sentences • Language = C programming Not every string of English Note: ASCII character set is characters is an English sentence different from English character set
Notation Languages are sets of strings. Need some notation for specifying which sets we want • Three equivalent formal ways to look at this approach Regular Expressions Finite State Automata Regular Languages Regular Grammars
Regular Expression In computing, a regular expression, also referred to as "regex" or "regexp", provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor. Way of representing regular languages.
Regular Expressions The regular expressions over Σ are the smallest set of expressions including kleen Clouser [a*] a*= ε, a, aaa Positive Clouser [a+] a+ = a, aaa Concatenation [a. b] a. b= a. b Union [a+b] a+b= a, b
Keyword Keyword: “else” or “ if ” or begin” or … ‘else’ + ‘if’ + ‘begin’ +. . . Note: ‘else’ abbreviates ‘e’ ‘l’ ‘s’ ‘e’
Integers Example: Integers Integer: a non empty string of digits Digit = ‘ 0’ + ‘ 1’ + ‘ 2’ + ‘ 3’ + ‘ 4’ + ‘ 5’ + ‘ 6’ + ‘ 7’ + ‘ 8’ + '9’ integer = digit* Abbreviation: A ++=AA*
Identifier Identifier: strings of letters or digits, starting with a letter Letter = ‘A’ +. . . + ‘Z’ + ‘a’ +. . . + ‘z’ Identifier = letter (letter + digit)* Is (letter* + digit*) the same?
Whitespace Whitespace: a non-empty sequence of blanks, newlines, and tabs (' ' + 'n' + 't')+
Atomic Regular Expressions Single character 'c ‘= { “C” } Epsilon ε = {“”}
Compound Regular Expressions Union A+ B ={s | s A or s B } Concatenation AB ={ab | a A and b B} Iteration ��∗ =U ��≥ 0 ���� where ���
Regular Expression Quick Guide q q q q ^ Matches the beginning of a line $ Matches the end of the line. Matches any character s Matches whitespace S Matches any non-whitespace character * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a chracter one or more times +? Repeats a character one or more times (non-greedy) [aeiou] Matches a single character in the listed set [^XYZ] Matches a single character not in the listed set [a-z 0 -9] The set of characters can include a range ( Indicates where string extraction is to start ) Indicates where string extraction is to end
Finite Automata • Regular expressions = specification • Finite automata = implementation • • A finite automaton (FA) is a simple idealized machine used to recognize patterns within input taken from some character set. The job of an FA is to accept or reject an input depending on whether the pattern defined by the FA occurs in the input. A finite automaton consists of • An input alphabet Σ • A set of states S • A start state n • A set of accepting states F ⊆ S • A set of transitions state →input state
Finite Automata • Transition • Is read • s 1 → a s 2 In state s 1 on input “a” go to state s 2 If end of input and in • • accepting state => accept Otherwise => reject
Finite Automata State Graphs • A state • The start state • An accepting state • A transition
A Simple Example A finite automaton that accepts only “ 1”
Another Simple Example • • A finite automaton accepting any number of 1’s followed by a single 0 Alphabet: {0, 1}
Epsilon Moves Another kind of transition: ε-moves Machine can move from state A to state B without reading input
Deterministic and Nondeterministic Automata ▶ Deterministic Finite Automata (DFA) One transition per input per state ▶ No ε-moves ▶ ▶ Nondeterministic Finite Automata(NFA) Can have multiple transitions for one input in a given state ▶ Can have ε-moves ▶
Execution of Finite Automata • A DFA can take only one path through the state graph • • Completely determined by input NFAs can choose • • Whether to make ε-moves Which of multiple transitions for a single input to take
NFA vs. DFA (1) • NFAs and DFAs recognize the same set of languages (regular languages) DFAs are faster to execute • There are no choices to consider
Regular Expressions to Finite Automata High-level sketch
Regular Expressions to NFA (1) • For each kind of rexp, define an NFA • Notation: NFA for rexp M • For ε • For input a
Regular Expressions to NFA (2) For AB • For A + B
Implementation • A DFA can be implemented by a 2 D table T • • One dimension is “states” Other dimension is “input symbol” For every transition Si →a Sk define T[i, a] = k • DFA “execution” • • If in state Si and input a, read T[i, a] = k and skip to state Sk Very efficient
Table Implementation of a DFA
A Simple Example Write a regular expression to find all instances of the determiner “the”: The recent attempt by the police to retain their current rates of pay has not gathered much favor with the southern factions.
Summary • • Regular expressions are a cryptic but powerful language for matching strings and extracting elements from those strings Regular expressions have special characters that indicate intent
- Slides: 31