Specification of Tokens l Lexemes are simple l

  • Slides: 22
Download presentation
Specification of Tokens l Lexemes are simple l Tokens are sets of lexemes. .

Specification of Tokens l Lexemes are simple l Tokens are sets of lexemes. . l So: Tokens form a REGULAR LANGUAGE l Use REGULAR EXPRESSION to precisely describe what strings each type of token can recognize 1

Learn by Example: l Token to be specified = Identifier of C • letter

Learn by Example: l Token to be specified = Identifier of C • letter → A | B | C | … | Z | a | b | … | z • digit → 0 | 1 | 2 | … | 9 • identifier → letter ( letter | digit )* 2

Another Example: l l l What are the patterns (Regular Expressions) for the following

Another Example: l l l What are the patterns (Regular Expressions) for the following tokens? if → if then → then else → else relop → < | <= | >= | <>(not equal) 3

Token Recognition l l Tokens can be recognized using a Transition diagram Token to

Token Recognition l l Tokens can be recognized using a Transition diagram Token to be specified >= Return (relop, GE) *Return (relop, GT) 4

Learn by Example l Relational Operators in Java l Specification of token relop →

Learn by Example l Relational Operators in Java l Specification of token relop → < | <= | >= | <> Recognition of token relop l • <= • >= • <> 5

Learn by Doing l Pattern for All Strings that start with “tab” or end

Learn by Doing l Pattern for All Strings that start with “tab” or end with “bat” ? l Answer tab {A, …, Z, a, . . . , z}* | {A, …, Z, a, . . , z}*bat 6

7

7

Learn by Doing l Identifiers in Java l Specification of token identifier • position

Learn by Doing l Identifiers in Java l Specification of token identifier • position • Sal 123 • ab • x identifier → letter ( letter | digit )* l Recognition of token identifier ? 8

Learn by Doing 9

Learn by Doing 9

Terminologies : Automata & Language Theory l l l Finite State Automata (FSA) •

Terminologies : Automata & Language Theory l l l Finite State Automata (FSA) • A recognizer that takes an input string and determines whether it’s a valid string of the language. Non-Deterministic FSA (NFA) • Has several alternative actions for the same input symbol Deterministic FSA (DFA) • Has 1 action for any given input symbol 10

Representing NFA 1) Transition Diagrams : Number states (circles), arcs, final states, … l

Representing NFA 1) Transition Diagrams : Number states (circles), arcs, final states, … l l l What language is defined? (a/b)*abb 11

Representing NFA 2) Transition Tables: More suitable to representation within a computer l 12

Representing NFA 2) Transition Tables: More suitable to representation within a computer l 12

Learn by Example l Given the regular expression : (a (b*c)) | (a (b

Learn by Example l Given the regular expression : (a (b*c)) | (a (b | c+)? ) l Find a transition diagram NFA that recognizes it 13

Learn by Example – NFA construction Step 1: (a (b*c)) | (a (b |

Learn by Example – NFA construction Step 1: (a (b*c)) | (a (b | c+)? ) (a (b*c)) 14

Learn by Example – NFA construction Step 2: (a (b*c)) | (a (b |

Learn by Example – NFA construction Step 2: (a (b*c)) | (a (b | c+)? ) 15

Learn by Example – NFA construction Step 3: (a (b*c)) | (a (b |

Learn by Example – NFA construction Step 3: (a (b*c)) | (a (b | c+)? ) 0 16

Working of NFA Learn by Example: OR Input: ababb 1. move(0, a) = 1

Working of NFA Learn by Example: OR Input: ababb 1. move(0, a) = 1 2. move(1, b) = 2 3. move(2, a) = ? (undefined) move(0, a) = 0 move(0, b) = 0 move(0, a) = 1 move(1, b) = 2 move(2, b) = 3 REJECT ! ACCEPT ! 17

The NFA Problem l l Two problems – Valid input may not be accepted

The NFA Problem l l Two problems – Valid input may not be accepted – Non-deterministic behavior from run to run… Solution ? 18

The DFA Saves The Day l A DFA is an NFA with a few

The DFA Saves The Day l A DFA is an NFA with a few restrictions l No epsilon transitions. l For every state s, there is only one transition (s, x) from s for any symbol x in Σ 19

NFA-DFA comparison 20

NFA-DFA comparison 20

How does this all fit together ? 3. Reg. Expr. → NFA construction NFA

How does this all fit together ? 3. Reg. Expr. → NFA construction NFA → DFA conversion DFA simulation for lexical analyzer l Point to Remember 1. 2. • Both NFA and DFA can be used to recognize tokens, but DFA are faster and more optimizable than NFA 21

l l l Tokens can be specified by using regular expressions Tokens can be

l l l Tokens can be specified by using regular expressions Tokens can be recognized through transition diagrams generated by regular expressions A transition diagram may be NFA or DFA but DFA is preferable because of its speed and optimization 22