2 Lexical Analysis Objectives To Understand 1 The
– 2– Lexical Analysis
Objectives • To Understand 1. The Role of a Lexical Analyzer 2. Lexical Analysis using formal Language definitions with Finite Automata 3. Specifications & Recognition of Tokens 4. A Language for Specifying Lexical Analyzers www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Programming Language Structure n Recall that a Programming Language is defined by 1. SYNTAX: – Decides whether a sentence in a language is well-formed 2. SEMANTICS – Determines the meaning, if any, of a syntactically well-formed sentence 3. GRAMMAR www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Syntax of a Programming Language • Describes the structure of programs without any consideration of their meaning. • The syntactic elements of a programming language are determined by the computation model and pragmatic concerns • well developed tools (regular, context-free and attribute grammars) are available for the description of the syntax of programming language • Lexical Analyzer & the Parser of a compiler handle the Syntax of the programming language www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Some Basic Definitions • lex-i-cal : Of or relating to words or the vocabulary of a language as distinguished from its grammar and construction • lexical analysis: The task concerned with breaking an input into its smallest meaningful units, called tokens. • syntax analysis: The task concerned with fitting a seque tokens into a specified syntax. To break a sentence down into its • parsing: component parts of speech with an explanation of the form, function, www. Bookspar. com |relationship Website for Students and syntactical of each part. | VTU - Notes - Question Papers
• Lexical Analyzer (A. k. a. Scanner) The only part of a compiler that looks at each character of the source text and does a linear analysis TOKENS • Reads source text and produces • Also keeps track of the source-coordinates of each token - which file name, line number and position – (This is useful for debugging & error indication purposes. ) • Advantages of a separate Lexical Analyzer: – Keeps Compiler design simple – Improves Efficiency and – Increases Portability www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
The Role of a Lexical Analyzer Lexical analyzer next char get next char Source Program next token get next token symbol table (Contains a record for each identifier) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers Syntax analyze r
• Tokens, Patterns and Lexemes What are Tokens ? – The basic lexical units of the language – A sequence of Abstract Characters that can be treated as a unit in the grammar of the language – A programming language classifies the tokens into a finite set of token types Some tokens may have • A note on Terminology Some texts refer to attributes integer constant token will have the actual integer (17, 42) as – token types as tokens & an attribute; – tokens as lexemes Identifiers will have a string with the actual id We will stick to the terms Tokens and Token Types www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
• Tokens Example Let us Consider the program segment: void main() { printf("Hello Worldn"); } • The tokens of this program segment are: 1. void, 7. (, 2. main, 8. "Hello Worldn", 3. (, 9. ), 4. ), 10. ; and 5. { 11. } 6. printf, www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
String Specifications of Tokens 1. Prefix of s 2. 3. 4. 5. Words and Sentences A string obtained by deleting trailing symbols suffix of s A string obtained by deleting leading symbols Substring of s A string obtained by deleting a prefix & a suffix Proper A prefix, suffix or sub string that is nonempty s. t s = x Subsequence of s A string obtained by deleting symbols not necessarily www. Bookspar. com | Website for Students | VTU - Notes - Question Papers contiguous
The Principle of Longest match • In most languages, the scanner should pick the longest possible string to make up the next token if there is a choice • Example return foobar != hohum; should be recognized as 5 tokens RETURN ID(foobar)0 NEQ ID(hohum) SCOLON not more (i. e. , not parts of words or identifiers, or ! and = as separate tokens) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Typical Tokens in Programming Languages • Operators & Punctuation – + - * / ( ) { } [ ] ; : : : < <= == = != ! … – Each of these is a distinct lexical class ( or token type ) • Keywords – if while for goto return switch void … – Each of these is also a distinct lexical class (not a string) • Identifiers – A single ID lexical class, but parameterized by actual id • Integer constants – A single INT lexical class, but parameterized by int value • Other constants, etc. www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Tokens of a Typical Language TYPE EXAMPLE ID foo, n 14, a, temp…… NUM REAL KEYWOR DS 73 , 00 , 515 , 66. 1 . 5 IF DO 10. 1 e 67 ……. . +2 ……. . 5. 5 e-10 WHILE INT ……… , (Comma) != (Noteq) ( (Lparen) SYMBOLS …….
Tokens of a Typical Language d n a d e TYPE EXAMPLE a n i f e e n i d f e y l d l a ID o foo, n 14, a, temp…… t m r s o n f o i s s en, 00 , r 515 NUM 73 , ok 0 es , +2 ……. . p t x e e r r 66. 1. 5 10. 1 e 67 5. 5 e-10 a a l e REAL ow u g g ……. . a e H r u : g n g KEYWOR n o n ti d? u. IFsi DO WHILE a l INT ……… s r e e DSiz a y u l B n Q u : g g n co r , (Comma) != (Noteq) ( (Lparen) e e r SYMBOLS l re nsw s a ……. a m A r n n o
• Formal Theory of Languages A language in real life is made up of 1. words made up of alphabets and 2. Sentences made up of words arranged according to the Grammar of that language • • Natural languages display amazing variety of expressions with Explicit & implicit meanings and variations in meaning as well as grammars Computer languages on the contrary focus on – The limited set of tasks to be performed – Hence mathematical precision is essential in defining theirwww. Bookspar. com structure and Grammar | Website for Students | VTU - Notes - Question Papers
• Formal Definition of Languages Alphabet F A finite (non-empty) set of symbols denoted by Σ • String F A finite sequence of symbols from an alphabet which includes even the empty (denoted by λ ) • Language sequence F A set ( often infinite) of finite strings F The set of all possible finite strings of elements of alphabet Σ ( including λ ) is denoted byofΣ*(possibly infinite) n Finite specifications languages is possible with 1. Automaton – a recognizer; a machine that accepts all strings in a language (and rejects all other strings) 2. Grammar – a www. Bookspar. com generator; a system for producing | Website for Students | VTU - Notes - Question Papers all strings in the language (and no other strings)
• t Formal Definition of Languages n e Alphabet F A finite (non-empty) set of symbols r e iff denoted by Σ d e F A finite sequence of symbols from an • String y g n a a u alphabet which includesmeven the empty g n y λ t)a a l sequence (denoted by b e • Language F A set ( often infinite)aof finite strings d n e m i o f o t ci possible ystrings of F The set ofeall finite u l a n pof &alphabet Σ o( including λ ) is s elements e rs T ies b a f denoted by Σ* i U y c m n Finite specifications of (possibly infinite) B a e m m p a s r e languages isgpossible with g n a o t u a g 1. Automaton – a recognizer; a machine that n m o la t accepts all strings u in a language (and rejects all A a r other strings) o r a 2. Grammar – a generator; a system for producing m am in the language (and no other strings) all rstrings www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Formal Language Definition ( Contd. ) • As already defined A language L over an alphabet Σ is a collection of strings of elements of Σ – The PASCAL Language is the set of all strings that constitute legal PASCAL programs (infinite set) – The Language of primes is a set of all decimal digit strings that constitute prime numbers (infinite set) – The language of C reserved words is the set of all alphabetic strings that can not be used as identifiers in the C programming language (finite set) • To specify some of these (possibly infinite) languages with finite description we use the notation of Regular Expressions www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
• Regular Expressions Is always defined over some alphabet Σ (For programming languages, it is commonly ASCII or Unicode) • If E is a regular expression, L(E ) is the “language” (set of strings) generated by E • For Example – For each symbol ‘a’ in the alphabet of the language the regular expression {a} denotes the language containing just the string a ( Known as symbol) • A regular expression generated with empty sequence λ is denoted by ε www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Operations with Regular Expressions • Given 2 regular expressions M & N • Alternation ( denoted by | ) makes a new regular expression M | N denoting a “UNION” of languages L(M) and L(N). { L(M) L(N) } • Concatenation ( denoted by. Or ) makes a new regular expression MN denoting a language L(M) followed by L(N). • The Repetiton ( denoted y * ) makes a new expression denoting a language that has 0 or more occurrences (Kleene closure) of L(M) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Regular Expression Example Expression Language Example Words a|b { a, b } a, b ab * a {a} {b} * {a) aa , abba , abbba … (ab)* { ab} * ε , ababab , … abba { abba } abba (0 | 1) * 0 { {0} {1} } * {0} 0 , 00 , 10, 010, 110, …. . . ( All binary Even numbers) b*(abb*)*(a | ε) consecutive a Strings of a and b with NO Similarly, using symbols, | , . , * and ε, we can specify the regular expressions corresponding to the lexical www. Bookspar. com | Website for Students | VTU - Notes - Question Papers tokens of a programming language using rules ( A. k. a.
Table of Operators & Abbreviations Notation Description a An ordinary character that stands for itself ε The empty String M|N Alternation; Choosing from M OR N MN Concatenation : An M followed by N M* Repetition ( Zero or more Times) M+ Repetition ( one or more times) M? Optional (Zero or one Occurrence of M) [a–z A–z Character set alteration ] [abxyz] One of the given characters (a|b|x|y|z). Stands for a single character ( except New line) ‘a. +*’ Quotation: A string in quotes stands for itself
Regular Expression Construction • Problem : Specify a set of unsigned numbers as a regular expression. (Examples: 1997, 19. 97) • Observations on numbers: 1. Could be made up of one or more digits from set (0 – 9) 2. Optionally Can have a decimal point in the end followed by 0 or more digits “. ”(0 – 9)* 3. A number can also start with a Point followed by one or more digits [(0 – 9)+[“. ”(0 – 9)*]]? | [“. ”(0 – 9) +] www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Regular Expressions for Some Tokens of a Programming Language Regular Expression Token Type if [ Return IF; ] [ a – z 0 – 9 ]* [ return ID ] [0– 9]+ [ return NUM ] ( [0 – 9 ] + ‘. ’[ 0 – 9 ] * ) | ( [ ‘. ’[ 0 – 9 ] +) Return REAL return + (‘*’ [ a – z ] * ‘n’ ) | (‘ ’)| ‘n’ | ‘*/’) Comment return. ERROR
• A regular Expression Recognizer Given an input string, The function of a “regular Expression Analyzer” is to say : – “YES, the input is part of the language generated from the regular expression” – “NO, the input isn’t part of the language generated from the regular expression” • Using results from Finite Automata theory and theory of algorithms, we can automate construction of such recognizers from Regular Expressions www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Finite Automata • • A finite Automation is a Transition Graph that has: – A finite set of states S (represented by Nodes) with Edges leading from one state to another – Each edge is labeled with the symbol ( from the set Σ ) that causes the transition ( Could be ε also !) – One state is denoted as start state S 0 and certain of the states are distinguished as final states ( normally denoted with two concentric circles) Mathematically, It can be represented as: A = {S, , s 0, F, move } www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Recognizing Expressions as Tokens with Finite State Automaton • Operate by reading input symbols (usually characters) – Transition can be taken if labeled with current symbol – ε-transition can be taken at any time • Accept when final state reached & no more input – Scanner slightly different – accept longest match even if more input • Reject if no transition possible or no more input and not in final state (DFA) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Finite Automata Examples if start 1 i 2 f 3 return IF a–z [ a – z ] [ a – z 0 – start 9 ]* 1 a–z 2 return ID 0– 9 [0– 9] + start 1 0– 9 2 0– 9 www. Bookspar. com | Website for Students | VTU - Notes - Question Papers return NUM
Finite Automata Examples ( Contd. ) ( [0 – 9 ] + ‘. ’[ 0 – 9 ] * ) | ( ‘. ’[ 0 – 9 ] +) 3 0– 9 5 0– 9 – . 9 1 2 0 start 0 9 – . 4 0– 9 return REAL www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Deterministic Finite Automata A finite automaton(DFA) is deterministic if • 1. It has no edges/transitions labeled with epsilon. 2. For each state and for each symbol in the alphabet, there is exactly one edge labeled with that symbol. Finite Automaton (DFA): • Such. Aa. Deterministic transition graph is called a state graph. start 0 b a 1 b 2 b 3 b*abb www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Non-deterministic Finite Automata (NFA) • In Non-deterministic Finite Automata: 1. From a state (node), there may be more than one edge labeled with the same alphabet and there may be no edge from a node labeled with an input symbol 2. An. Aedge can be labeled by an empty symbol(NFA): Non-deterministic Finite Automaton too a start 0 b a 1 b 2 www. Bookspar. com | Website for Students | VTU - Notes - Question Papers b 3 (a|b)*abb
Another NFA a start a b b An -transition is taken without consuming any character from the input. aa* | bb* What does the above NFA accept? www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
• NFA and DFA – A Comparison • DFA NFA – Has edges/transitions labeled with epsilon – no edges/transitions labeled with epsilon – From a state (node), there may be more than one edge labeled with the same alphabet and there may be no edge from a node labeled with an input symbol – For each state and for each symbol in the alphabet, there is exactly one edge labeled with that symbol – Slower to build but – Quicker to build but www. Bookspar. com | Website for Students quicker to simulate slower to simulate | VTU - Notes - Question Papers
Relationship between DFA & NFA • It is obvious that DFA can be simulated with an NFA • But what is not so obvious is that NFA can be simulated with a DFA !!! • How ? • Simulate sets of possible states • Possible exponential blowup in the state space • Still, Maintain one state per character in the www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
• Automating a RE Recognizer Construction To convert a specification into code: 1. Write down the RE for the input language 2. Build a big NFA 3. Build the DFA that simulates the NFA 4. Systematically shrink the DFA 5. Turn it into code Note: The DFA construction is done automatically by a www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Building NFA From Regular Expression • Remember that A regular expression is formed by the use of : – – Basic symbols and their Alternation, Concatenation, and Repetition. • Hence, All we need to do is to know is: – How to build the NFA for the above (symbols & Operations), and – How to assemble those NFA’s corresponding to these symbols into awww. Bookspar. composite NFA for the expression | Website for Students | VTU - Notes - Question Papers
Building NFA for Symbols & Operations 1. Building NFA for a basic symbol a: 1. Start with an Initial State i, 2. Draw an edge / Transition labeled with an alphabet (This Could be an epsilon symbol too!!) 3. to the final state f start i a f start www. Bookspar. com | Website for Students | VTU - Notes - Question Papers i f
Building NFA for Symbols & Operations 2. Building NFA for Alternation N (s | t) : – Given two NFA N(s) and N(t), 1. Construct new start state i, and new final state f. 2. Add a transition from the start state i to the start states of N(s) and N(t) and label them with epsilon symbol 3. Add a transition from the Final states of N(s) and N(t) to the final state f and label them with Epsilon symbol N(s) start i f N(t) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Building NFA for Symbols & Operations 3. Building NFA for Concatenation N(s. t) or N(st) : – Given two NFA N(s) and N(t), 1. Construct new start state i, and new final state f. 2. Overlap the Start state of later [ N(t) ] with the final state of the former [N(s) ] 3. From the start state, add an edge labeled with epsilon to start state of N(s) N(t) 4. From the. N(s) final state of E 1, add an epsilon transition start to Start state of N(t) i www. Bookspar. com | Website for Students | VTU - Notes - Question Papers f
Building NFA for Symbols & Operations 4. Building NFA for Repetition N(s*) : 1. Construct new start state and new final state 2. Add an epsilon transition from new Start state to the new final state. 3. Add an epsilon transition from the new final state to the start state of N(s). 4. Add another epsilon transition from the final state of N(s) to the constructed final state. start i N(s) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers f
Construction of NFA – Examples (a|b). (a| b) (b) a (a) a b (a|b). (a| b) a b b a b www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Construction of NFA – Examples [ a – z (Contd. ) ][a–z 0– 9 Symbol Repetition ]* start a – z 1 2 [0– 9] = Symbol + start 1 0– 9 2 a-z 6 0 - 7 8 Return ID 9 [0– 9][0– 9 Repetition ]* 3 0– 9 4 www. Bookspar. com | Website for Students | VTU - Notes - Question Papers 5 Return NUM
Combining Several NFA’s 2 i 3 f IF a-z 4 a-z 6 5 0 -9 7 9 14 Any character 0 -9 15 10 ERROR www. Bookspar. com | Website for Students | VTU - Notes - Question Papers ID 1 8 NUM 11 0 -9 12 13
• Automating a RE Recognizer Construction To convert a specification into code: 1. Write down the RE for the input language 2. Build a big NFA 3. Build the DFA that simulates the NFA 4. Systematically shrink the DFA 5. Turn it into code Note: The DFA construction is done automatically by a tool such as lex www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Conversion of NFA to DFA • A DFA can be constructed from the NFA, where each DFA state represents a set of NFA states from the NFA • Key idea The state of the DFA after reading some input is the set of all states the NFA could have reached after reading the same input • If NFA has n states, DFA will have at most 2 n states • Resulting DFA may have more states than needed www. Bookspar. com | Website for Students | VTU - Notes - Question Papers • Let us study the conversion with an example
Converting NFA to DFA 2 i 1 f 3 4 IF a-z Any 14 character 15 0 -9 9 6 5 0 -9 7 8 ERROR 10 ID NUM 11 What states can be reached from state 1 without consuming a character? 0 -9 12 13 {1, 4, 9, 14} form the -closure of state 1 Defn: Given a set of NFA states T, the -closure(T) is www. Bookspar. com | Website for Studentsthrough -transiton the set of states that are reachable | VTU - Notes - Question Papers
2 a-z i 1 f Converting NFA to DFA 3 IF 4 a-z Any 14 character 15 0 -9 9 6 5 0 -9 7 8 ERROR 10 ID NUM 11 at are ALL the state closures in this NFA? 0 -9 12 closure(1) = {1, 4, 9, 14} closure(5) = {5, 6, 8} closure(10) = {10, 11, 13} closure(8) = {6, 8} closure(13) = {11, 13} www. Bookspar. com | Website for Students closure(12) = {12, 13} | VTU - Notes - Question Papers closure(7) = {7, 8, 6} 13
Converting NFA to DFA • We already Know that Given a set of NFA states T, the -closure(T) is the set of states that are reachable through -transiton from any state s T. • We now define Given a set of NFA states T, move( T, a) is the set of states that are reachable on input a from any state s T • Now the Problem Definition: Given an NFA find the DFA with the minimum number of states that has the same behavior as the NFA for all inputs. www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Converting NFA to DFA f IF i 1 2 3 4 a-z Any 14 character Dstates = {1 -4 -9 -14} 9 6 5 15 0 -9 7 8 ERROR 10 ID NUM 11 0 -9 12 13 1. Start with the initial state in the NFA ( s 0), & work out the set of states in the DFA, Dstates, initialized with www. Bookspar. com | Website for Students | VTU - Notes - Question a state representing -closure(s 0). Papers
Converting NFA to DFA f IF i 1 3 2 4 a-z Any 14 character 9 6 5 15 0 -9 1 -4 -9 -14 5 -6 -8 -15 0 -9 7 8 10 ID ERROR Dstates = {1 -4 -9 -14} a-h NUM 11 0 -9 12 13 Now we need to compute: Move(1 -4 -9 -14, a-h) ={ ? 5, 15 } {5, 6, 8, 15 Then, -closure({5, 15}) = www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Converting NFA to DFA f IF i 1 2 3 4 a-z Any 14 character 9 6 5 15 0 -9 7 8 ERROR 10 Dstates = {1 -4 -9 -14} a-h 11 0 -9 12 ID NUM 13 5 -6 -8 -15 Next we need to compute: 1 -4 -9 -14 } Move(1 -4 -9 -14, i) ={ 2, 5, 15 ? i 2 -5 -6 -8 -15 {2, 5, 6, 8, 15 Then, for-closure({2, 5, 15}) = www. Bookspar. com | Website Students | VTU - Notes - Question Papers
Converting NFA to DFA f IF i 1 2 3 4 a-z Any 14 character 9 6 5 15 0 -9 7 8 ERROR 10 Dstates = {1 -4 -9 -14} 11 0 -9 12 ID NUM 13 a-h 5 -6 -8 -15 j-z Next we need to compute: 1 -4 -9 -14 } Move(1 -4 -9 -14, j-z) ={ 5, 15 ? i 2 -5 -6 -8 -15 {5, 6, 8, 15} Then, -closure(5, 15}) = www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Converting NFA to DFA f IF i 1 2 3 4 a-z Any 14 character Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 j-z 9 6 5 15 0 -9 7 8 ERROR 10 11 0 -9 12 ID NUM 13 Next we need to compute: 1 -4 -9 -14 Move(1 -4 -9 -14, 0 -9) {10, 15 =? } i 0 -9 2 -5 -6 -8 -15 {10, 13, 11, 1 Then, for-closure(10, 15}) = www. Bookspar. com | Website Students | VTU - Notes - Question Papers 10 -11 -13 -15
Converting NFA to DFA f IF i 1 2 3 4 Dstates = {1 -4 -9 -14} 15 other a-z Any 14 character a-h 5 -6 -8 -15 j-z 9 6 5 15 0 -9 7 ERROR 10 ID 8 NUM 11 0 -9 12 13 Next we need to compute: 1 -4 -9 -14 Move(1 -4 -9 -14, other) {15 =? } i 0 -9 2 -5 -6 -8 -15 Then, = {15} www. Bookspar. com | Website for Students -closure(15) | VTU - Notes - Question Papers 10 -11 -13 -15
Converting NFA to DFA f IF i 1 2 3 4 Dstates = {1 -4 -9 -14} 15 other a-z Any 14 character a-h 5 -6 -8 -15 j-z 9 6 5 15 0 -9 7 8 ERROR 10 11 0 -9 12 ID NUM 13 The analysis for 1 -4 -9 -14 is 1 -4 -9 -14 complete. We mark it and i 0 -9 2 -5 -6 -8 -15 pick another state in the DFA www. Bookspar. com | Website for Students to analyze. | VTU - Notes - Question Papers 10 -11 -13 -15
Converted DFA a-e, g-z, 0 -9 ID 2 -5 -6 -8 -15 i a-h 1 -4 -9 -14 j-z 0 -9 other ID 5 -6 -8 -15 f IF 3 -6 -7 -8 a-z, 0 -9 ID 6 -7 -8 a-z, 0 -9 NUM 0 -9 11 -12 -13 10 -11 -13 -15 error 15 www. Bookspar. com | Website for Students | VTU - Notes - Question Papers 0 -9 a-z, 0 -9
Another Example of Conversion S 0 S 1 S 2 a S 3 b S 4 S 5 S 6 S 7 S 8 a S 9 b S 10 S 11 §The above NFA would result in DFA below: a s 3, s 5, s 6, s 7, s 8 a s 9, s 11 b s 0, s 1, s 2 b a s 4, s 5, s 6, s 7, s 8 www. Bookspar. com | Website for Students | VTU - Notes - Question Papers b s 10, s 11
• Automating a RE Recognizer Construction To convert a specification into code: 1. Write down the RE for the input language 2. Build a big NFA 3. Build the DFA that simulates the NFA 4. Systematically shrink the DFA 5. Turn it into code Note: The DFA construction is done automatically by a www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Systematically shrink the DFA • The Big Picture – Discover sets of equivalent states – Represent each such set with just one state • Two states are equivalent if and only if: – The set of paths leading to them are equivalent – α Є Σ, transitions on α lead to equivalent states (DFA) – α-transitions to distinct sets states must be in distinct sets A • A partition P of S – A collection of sets P s. t. each s Є S is in exactly one pi Є P www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Minimization a p 0 p 1 b p 2 a b p 3 p 4 {p 0, p 1, p 2, p 3, p 4}. Group all the states together. Separate states according to available exit transitions. Separate a set to two if from some of its states one can reach another set and with others one cannot. www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
a p 0 Minimization a p 1 b b a b p 2 p 3 p 4 §The above DFA can now be minimized as: a a b b www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
• Automating a RE Recognizer Construction To convert a specification into code: 1. Write down the RE for the input language 2. Build a big NFA 3. Build the DFA that simulates the NFA 4. Systematically shrink the DFA 5. Turn it into code Note: The DFA construction is done automatically by a www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Pseudo Code For lexical Analyzer function lexan; integer else if C is a letter then Var lexbuf : array [0, . . 100] of char begin C: char place C and successive letters & Begin digits into lexbuf : loop begin p : = lookup ( lexbuf ) : read a character into C: tokenval : = p: if C is a blank or a tab then return the token field of table do nothing entry p else if C is a newline then end increment lineno else if C is a digit begin /* token is a single character begin */ set Tokenval to the value set tokenval to NONE /* no of this & flwg digits; attribute */ return NUM return integer encoding of www. Bookspar. com | Website for Students end character C | VTU - Notes - Question Papers
• Automating a RE Recognizer Construction To convert a specification into code: 1. Write down the RE for the input language 2. Build a big NFA 3. Build the DFA that simulates the NFA 4. Systematically shrink the DFA 5. Turn it into code Note: The DFA construction is done automatically by a www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Building Lexical Analyzers Automatically The point to note is : The Process studied so far is well suited for Automation 1. Implementer writes down the regular expressions • 2. Scanner generator builds NFA, DFA, minimal DFA, and then writes out the (tabledriven or direct-coded) code 3. This process reliably produces fast, robust www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
• Lexx – A tool for generating A widely used tool Scanner for specifying Lexical Analyzers for a wide variety of languages. How does it work ? Lexx Source Pgm 1. Specs of a Lexical lex. l Analyzer is prepared by LEX creating a program lex. l ( Compiler containing RE’s) in the Lex lex. yy. c language 2. Then lex. l is run thru C Compiler Lex Compiler to produce a program lex. yy. c ( A. out Contains a tabular representaion of state Input A. ou Transition Diagram) Stream Sequenc t www. Bookspar. com | Website for Students e 3. Lex. yy. c is run thru C | VTU - Notes - Question Papers
Lexx Functions 1. Translates the definitions into an automaton. 2. The automaton looks for the longest matching string. 3. Either return some value to the reading program (parser), or looks for next token. 4. Look ahead operator: x/y allow the token x only if y follows it (but y is not part of the token). www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Lexx Program Structure • A Lexx Program ( nothing but specifications in lex. l ) Consists of THREE Parts. 1. Declarations. FThis section includes declaration of Variables, manifest Constants. 2. Translation Rules. FThis section includes patterns and the corresponding action to be taken ( RE) 3. Auxilliary procedures FThis section includes what ever Auxiliary procedures that are needed Three sections are separated by lines beginning www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
1) %{ A Sample Lexx Program /* Remove uppercase letters. Commands to execute are lex test. l and gcc lex. yy. c -ll -o test */ %} %% [A-Z]+ ; 2) %{ /* Line numbering */ %} %% www. Bookspar. com | Website for Students | VTU - Notes - Question Papers ^. *n printf(“%dt%s”, yylineno-
Any Questions ? ? Thank you www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Regular Expression Construction • Problem : Specify a set of unsigned numbers as a regular expression. (Examples: 1997, 19. 97) • Solution : Start with symbol and keep defining regular sub-expressions till the final expression is achieved RULE 1. digit 0|1|2|3| … |9 RULE 2. digits digit* (or digit+) [Kleene star closure meaning 1 or more di RULE 3. optional_fraction RULE 4. Num ‘. ’ digits | epsilon digits optional_fraction www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Regular Expression Construction • Problem : Specify a set of unsigned numbers fas aa o s regular expression. (Examples: 1997, 19. 97) n o i t i n i f • Solution : Start with symbol and keep e defining d e expression is regular sub-expressions till thethfinal n L o i L s A achieved RULE 1. digit 0 | 1 d| 2 | r 3 es| … | 9 e xp s u digit* e (or digit+) RULE 2. digit r e digit v ula a h g [r. Kleene star closure meaning 1 or more di e e w t a h t ‘. ’ digit | epsilon optional_fraction RULE 3. e t o N RULE 4. Num digit optional_fraction www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Unsigned Number validation using Rules • Let us derive the number from these rules RULE 1. digit RULE 2. digits 0|1|2|3| … |9 digit* (or digit+) [Kleene star closure meaning 1 or more di RULE 3. optional_fraction RULE 4. Num 1 9 97 ‘. ’ digits | epsilon digits optional_fraction 2 5 §. 97 3 6 §. www. Bookspar. com | Website for Students | VTU - Notes - Question Papers §. 14
Regular Expression Construction • Qn: How to write a regular expression for identifiers? (identifiers are letters followed by a letter or a digit). • Answer: 1. Letter a|A|b|B|… |z|Z 0|1|2|3| … |9 2. Digit Letter | Digit 3. Letter_or_Digit Letter | letter_or_digit 4. Identifier • One can define similar regular expression (s) for comments, Strings, operators and delimiters ( the different tokens of a language) www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
Grammar for a Tiny Language • program : : = statement | program statement • statement : : = assign. Stmt | if. Stmt • assign. Stmt : : = id = expr ; • if. Stmt : : = if ( expr ) stmt • expr : : = id | int | expr + expr • Id : : = a | b | c | i | j | k | n | x | y | z • int 0 | 1 | 2 |are 3 |also 4 | 5 Known | 6 | 7 |as 8 |Produ 9 The rules of a: : =grammar www. Bookspar. com | Website for Students | VTU - Notes - Question Papers
- Slides: 75