Finite Automata Language Theory Finite Automata CSE 244

  • Slides: 55
Download presentation
Finite Automata & Language Theory Finite Automata : CSE 244 A recognizer that takes

Finite Automata & Language Theory Finite Automata : CSE 244 A recognizer that takes an input string & determines whether it’s a valid sentence of the language Non-Deterministic : Has more than one alternative action for the same input symbol. Deterministic : Has at most one action for a given input symbol. Both types are used to recognize regular expressions. CH 3. 1

NFAs & DFAs CSE 244 Non-Deterministic Finite Automata (NFAs) easily represent regular expression, but

NFAs & DFAs CSE 244 Non-Deterministic Finite Automata (NFAs) easily represent regular expression, but are somewhat less precise. Deterministic Finite Automata (DFAs) require more complexity to represent regular expressions, but offer more precision. We’ll review both plus conversion algorithms, i. e. , NFA DFA and DFA NFA CH 3. 2

Non-Deterministic Finite Automata An NFA is a mathematical model that consists of : CSE

Non-Deterministic Finite Automata An NFA is a mathematical model that consists of : CSE 244 • S, a set of states • , the symbols of the input alphabet • move, a transition function. • move(state, symbol) set of states • move : S { } Pow(S) • A state, s 0 S, the start state • F S, a set of final or accepting states. CH 3. 3

Representing NFAs CSE 244 Transition Diagrams : Number states (circles), arcs, final states, …

Representing NFAs CSE 244 Transition Diagrams : Number states (circles), arcs, final states, … Transition Tables: More suitable to representation within a computer We’ll see examples of both ! CH 3. 4

Example NFA a S = { 0, 1, 2, 3 } CSE 244 start

Example NFA a S = { 0, 1, 2, 3 } CSE 244 start s 0 = 0 a 0 F={3} 1 b b 2 3 b = { a, b } What Language is defined ? What is the Transition Table ? input s t a t e a b 0 { 0, 1 } {0} 1 -- {2} 2 -- {3} (null) moves possible i j Switch state but do not use any input symbol CH 3. 5

How Does An NFA Work ? a CSE 244 start a 0 b 1

How Does An NFA Work ? a CSE 244 start a 0 b 1 b 2 b 3 • Given an input string, we trace moves • If no more input & in final state, ACCEPT EXAMPLE: Input: ababb move(0, a) = 1 move(1, b) = 2 move(2, a) = ? (undefined) REJECT ! -ORmove(0, a) = 0 move(0, b) = 0 move(0, a) = 1 move(1, b) = 2 move(2, b) = 3 ACCEPT ! CH 3. 6

Handling Undefined Transitions CSE 244 We can handle undefined transitions by defining one more

Handling Undefined Transitions CSE 244 We can handle undefined transitions by defining one more state, a “death” state, and transitioning all previously undefined transition to this death state. a start a 0 b b 1 2 b 3 a a a, b 4 CH 3. 7

NFA- Regular Expressions & Compilation Problems with NFAs for Regular Expressions: CSE 244 1.

NFA- Regular Expressions & Compilation Problems with NFAs for Regular Expressions: CSE 244 1. Valid input might not be accepted 2. NFA may behave differently on the same input Relationship of NFAs to Compilation: 1. Regular expression “recognized” by NFA 2. Regular expression is “pattern” for a “token” 3. Tokens are building blocks for lexical analysis 4. Lexical analyzer can be described by a collection of NFAs. Each NFA is for a language token. CH 3. 8

Second NFA Example Given the regular expression : (a (b*c)) | (a (b |

Second NFA Example Given the regular expression : (a (b*c)) | (a (b | c+)? ) CSE 244 Find a transition diagram NFA that recognizes it. CH 3. 9

Second NFA Example - Solution Given the regular expression : (a (b*c)) | (a

Second NFA Example - Solution Given the regular expression : (a (b*c)) | (a (b | c+)? ) CSE 244 Find a transition diagram NFA that recognizes it. b c 2 4 start 0 a b 1 c 3 c 5 String abbc can be accepted. CH 3. 10

Alternative Solution Strategy a (b*c) 1 b a c 2 3 CSE 244 6

Alternative Solution Strategy a (b*c) 1 b a c 2 3 CSE 244 6 a (b | c+)? 4 a 5 b c c Now that you have the individual diagrams, “or” them as follows: 7 CH 3. 11

Using Null Transitions to “OR” NFAs CSE 244 1 b a c 2 3

Using Null Transitions to “OR” NFAs CSE 244 1 b a c 2 3 6 0 4 a 5 b c c 7 CH 3. 12

Other Concepts Not all paths may result in acceptance. a CSE 244 start a

Other Concepts Not all paths may result in acceptance. a CSE 244 start a 0 1 b 2 b 3 b aabb is accepted along path : 0 0 1 2 3 BUT… it is not accepted along the valid path: 0 0 0 CH 3. 13

Deterministic Finite Automata A DFA is an NFA with the following restrictions: CSE 244

Deterministic Finite Automata A DFA is an NFA with the following restrictions: CSE 244 • moves are not allowed • For every state s S, there is one and only one path from s for every input symbol a . Since transition tables don’t have any alternative options, DFAs are easily simulated via an algorithm. s s 0 c nextchar; while c eof do s move(s, c); c nextchar; end; if s is in F then return “yes” else return “no” CH 3. 14

Example - DFA b a CSE 244 start a 0 b 1 b 2

Example - DFA b a CSE 244 start a 0 b 1 b 2 3 a b a What Language is Accepted? Recall the original NFA: a start a 0 1 b 2 b 3 b CH 3. 15

Conversion : NFA DFA Algorithm CSE 244 • Algorithm Constructs a Transition Table for

Conversion : NFA DFA Algorithm CSE 244 • Algorithm Constructs a Transition Table for DFA from NFA • Each state in DFA corresponds to a SET of states of the NFA • Why does this occur ? • moves • non-determinism Both require us to characterize multiple situations that occur for accepting the same string. (Recall : Same input can have multiple paths in NFA) • Key Issue : Reconciling AMBIGUITY ! CH 3. 16

Converting NFA to DFA – 1 st Look CSE 244 a 2 b 3

Converting NFA to DFA – 1 st Look CSE 244 a 2 b 3 4 0 1 5 8 6 c 7 From State 0, Where can we move without consuming any input ? This forms a new state: 0, 1, 2, 6, 8 this new state ? What transitions are defined for CH 3. 17

The Resulting DFA a 0, 1, 2, 6, 8 CSE 244 3 a a

The Resulting DFA a 0, 1, 2, 6, 8 CSE 244 3 a a c b 1, 2, 5, 6, 7, 8 c 1, 2, 4, 5, 6, 8 c Which States are FINAL States ? a A a B b c D c c a How do we handle alphabet symbols not defined for A, B, C, D ? C CH 3. 18

Algorithm Concepts NFA CSE 244 N = ( S, , s 0, F, MOVE

Algorithm Concepts NFA CSE 244 N = ( S, , s 0, F, MOVE ) -Closure(s) : s S : set of states in S that are reachable No input is from s via -moves of N that originate consumed from s. -Closure(T) : T S : NFA states reachable from all t T on -moves only. move(T, a) : T S, a : Set of states to which there is a transition on input a from some t T These 3 operations are utilized by algorithms / techniques to facilitate the conversion process. CH 3. 19

Illustrating Conversion – An Example Start with NFA: a 2 CSE 244 (a |

Illustrating Conversion – An Example Start with NFA: a 2 CSE 244 (a | b)*abb 3 start 0 1 6 b 4 a 7 b 8 9 b 5 10 First we calculate: -closure(0) (i. e. , state 0) -closure(0) = {0, 1, 2, 4, 7} (all states reachable from 0 on -moves) Let A={0, 1, 2, 4, 7} be a state of new DFA, D. CH 3. 20

Conversion Example – continued (1) 2 nd , we calculate : a : -closure(move(A,

Conversion Example – continued (1) 2 nd , we calculate : a : -closure(move(A, a)) and b : -closure(move(A, b)) CSE 244 a : -closure(move(A, a)) = -closure(move({0, 1, 2, 4, 7}, a))} adds {3, 8} ( since move(2, a)=3 and move(7, a)=8) From this we have : -closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} (since 3 6 1 4, 6 7, and 1 2 all by -moves) Let B={1, 2, 3, 4, 6, 7, 8} be a new state. Define Dtran[A, a] = B. b : -closure(move(A, b)) = -closure(move({0, 1, 2, 4, 7}, b)) adds {5} ( since move(4, b)=5) From this we have : -closure({5}) = {1, 2, 4, 5, 6, 7} (since 5 6 1 4, 6 7, and 1 2 all by -moves) Let C={1, 2, 4, 5, 6, 7} be a new state. Define Dtran[A, b] = C. CH 3. 21

Conversion Example – continued (2) 3 rd , we calculate for state B on

Conversion Example – continued (2) 3 rd , we calculate for state B on {a, b} a : -closure(move(B, a)) = -closure(move({1, 2, 3, 4, 6, 7, 8}, a))} = {1, 2, 3, 4, 6, 7, 8} = B CSE 244 Define Dtran[B, a] = B. b : -closure(move(B, b)) = -closure(move({1, 2, 3, 4, 6, 7, 8}, b))} = {1, 2, 4, 5, 6, 7, 9} = D Define Dtran[B, b] = D. 4 th , we calculate for state C on {a, b} a : -closure(move(C, a)) = -closure(move({1, 2, 4, 5, 6, 7}, a))} = {1, 2, 3, 4, 6, 7, 8} = B Define Dtran[C, a] = B. b : -closure(move(C, b)) = -closure(move({1, 2, 4, 5, 6, 7}, b))} = {1, 2, 4, 5, 6, 7} = C Define Dtran[C, b] = C. CH 3. 22

Conversion Example – continued (3) 5 th , we calculate for state D on

Conversion Example – continued (3) 5 th , we calculate for state D on {a, b} a : -closure(move(D, a)) = -closure(move({1, 2, 4, 5, 6, 7, 9}, a))} = {1, 2, 3, 4, 6, 7, 8} = B CSE 244 Define Dtran[D, a] = B. b : -closure(move(D, b)) = -closure(move({1, 2, 4, 5, 6, 7, 9}, b))} = {1, 2, 4, 5, 6, 7, 10} = E Define Dtran[D, b] = E. Finally, we calculate for state E on {a, b} a : -closure(move(E, a)) = -closure(move({1, 2, 4, 5, 6, 7, 10}, a))} = {1, 2, 3, 4, 6, 7, 8} = B Define Dtran[E, a] = B. b : -closure(move(E, b)) = -closure(move({1, 2, 4, 5, 6, 7, 10}, b))} = {1, 2, 4, 5, 6, 7} = C Define Dtran[E, b] = C. CH 3. 23

Conversion Example – continued (4) This gives the transition table Dtran for the DFA

Conversion Example – continued (4) This gives the transition table Dtran for the DFA of: CSE 244 Input Symbol a b Dstates A B C D E B B B C D C E C b start A a B a b b a D b E a CH 3. 24

Algorithm For Subset Construction push all states in T onto stack; CSE 244 initialize

Algorithm For Subset Construction push all states in T onto stack; CSE 244 initialize -closure(T) to T; computing the -closure while stack is not empty do begin pop t, the top element, off the stack; for each state u with edge from t to u labeled do if u is not in -closure(T) do begin add u to -closure(T) ; push u onto stack end CH 3. 25

Algorithm For Subset Construction – (2) initially, -closure(s 0) is only (unmarked) state in

Algorithm For Subset Construction – (2) initially, -closure(s 0) is only (unmarked) state in Dstates; CSE 244 while there is unmarked state T in Dstates do begin mark T; for each input symbol a do begin U : = -closure(move(T, a)); if U is not in Dstates then add U as an unmarked state to Dstates; Dtran[T, a] : = U end CH 3. 26

Regular Expression to NFA Construction We now focus on transforming a Reg. Expr. to

Regular Expression to NFA Construction We now focus on transforming a Reg. Expr. to an NFA CSE 244 This construction allows us to take: • Regular Expressions (which describe tokens) • To an NFA (to characterize language) • To a DFA (which can be “computerized”) The construction process is component-wise Builds NFA from components of the regular expression in a special order with particular techniques. NOTE: Construction is “syntax-directed” translation, i. e. , syntax of regular expression is determining factor for NFA construction and structure. CH 3. 27

Motivation: Construct NFA For: Î: CSE 244 a: b: ab: | ab : a*

Motivation: Construct NFA For: Î: CSE 244 a: b: ab: | ab : a* ( | ab )* : CH 3. 28

Motivation: Construct NFA For: Î: CSE 244 start i f start a: b: start

Motivation: Construct NFA For: Î: CSE 244 start i f start a: b: start ab: start b A 0 a 0 1 B a 1 A b B | ab : a* ( | ab )* : CH 3. 29

Construction Algorithm : R. E. NFA Construction Process : CSE 244 1 st :

Construction Algorithm : R. E. NFA Construction Process : CSE 244 1 st : Identify subexpressions of the regular expression symbols r|s rs r* 2 nd : Characterize “pieces” of NFA for each subexpression CH 3. 30

Piecing Together NFAs 1. For in the regular expression, construct NFA CSE 244 start

Piecing Together NFAs 1. For in the regular expression, construct NFA CSE 244 start i f L( ) 2. For a in the regular expression, construct NFA start i a f L(a) CH 3. 31

Piecing Together NFAs – continued(1) CSE 244 3. (a) If s, t are regular

Piecing Together NFAs – continued(1) CSE 244 3. (a) If s, t are regular expressions, N(s), N(t) their NFAs s|t has NFA: N(s) start L(s) L(t) i f N(t) where i and f are new start / final states, and -moves are introduced from i to the old start states of N(s) and N(t) as well as from all of their final states to f. CH 3. 32

Piecing Together NFAs – continued(2) 3. (b) If s, t are regular expressions, N(s),

Piecing Together NFAs – continued(2) 3. (b) If s, t are regular expressions, N(s), N(t) their NFAs st (concatenation) has NFA: CSE 244 start i N(s) N(t) f L(s) L(t) overlap Alternative: start i N(s) N(t) f where i is the start state of N(s) (or new under the alternative) and f is the final state of N(t) (or new). Overlap maps final states of N(s) to start state of N(t). CH 3. 33

Piecing Together NFAs – continued(3) 3. (c) If s is a regular expressions, N(s)

Piecing Together NFAs – continued(3) 3. (c) If s is a regular expressions, N(s) its NFA, s* (Kleene star) has NFA: CSE 244 start i N(s) f where : i is new start state and f is new final state -move i to f (to accept null string) -moves i to old start, old final(s) to f -move old final to old start (WHY? ) CH 3. 34

Properties of Construction Let r be a regular expression, with NFA N(r), then CSE

Properties of Construction Let r be a regular expression, with NFA N(r), then CSE 244 1. N(r) has #of states 2*(#symbols + #operators) of r 2. N(r) has exactly one start and one accepting state 3. Each state of N(r) has at most one outgoing edge a or at most two outgoing ’s 4. BE CAREFUL to assign unique names to all states ! CH 3. 35

Detailed Example CSE 244 See example 3. 16 in textbook for (a | b)*abb

Detailed Example CSE 244 See example 3. 16 in textbook for (a | b)*abb 2 nd Example - (ab*c) | (a(b|c*)) Parse Tree for this regular expression: r 13 r 5 r 3 a r 12 | r 4 r 11 r 2 a r 10 ( r 7 r 0 b * ) r 9 c b What is the NFA? Let’s construct it ! r 8 | r 6 * c CH 3. 36

Detailed Example – Construction(1) CSE 244 r 0 : b r 3 : a

Detailed Example – Construction(1) CSE 244 r 0 : b r 3 : a r 2 : c r 1 : b r 4 : r 1 r 2 b c r 5 : r 3 r 4 a b c CH 3. 37

Detailed Example – Construction(2) CSE 244 r 7 : b r 11: a r

Detailed Example – Construction(2) CSE 244 r 7 : b r 11: a r 8 : c b c r 6 : r 9 : r 7 | r 8 c r 10 : r 9 b a r 12 : r 11 r 10 c CH 3. 38

Detailed Example – Final Step r 13 : r 5 | r 12 CSE

Detailed Example – Final Step r 13 : r 5 | r 12 CSE 244 a 2 3 4 b 5 6 c 7 1 b 10 a 9 11 8 17 12 13 c 14 15 16 CH 3. 39

Direct Simulation of an NFA CSE 244 s s 0 c nextchar; while c

Direct Simulation of an NFA CSE 244 s s 0 c nextchar; while c eof do s move(s, c); c nextchar; end; if s is in F then return “yes” else return “no” S -closure({s 0}) c nextchar; while c eof do S -closure(move(S, c)); c nextchar; end; if S F then return “yes” else return “no” DFA simulation NFA simulation CH 3. 40

Final Notes : R. E. to NFA Construction • So, CSE 244 an NFA

Final Notes : R. E. to NFA Construction • So, CSE 244 an NFA may be simulated by algorithm, when NFA is constructed using Previous techniques • Algorithm run time is proportional to |N| * |x| where |N| is the number of states and |x| is the length of input • Alternatively, we can construct DFA from NFA and use the resulting Dtran to recognize input: space required time to simulate NFA O(|r|) O(|r|*|x|) DFA O(2|r|) O(|x|) where |r| is the length of the regular expression. CH 3. 41

Pulling Together Concepts • Designing Lexical Analyzer Generator CSE 244 Reg. Expr. NFA construction

Pulling Together Concepts • Designing Lexical Analyzer Generator CSE 244 Reg. Expr. NFA construction NFA DFA conversion DFA simulation for lexical analyzer • Recall Lex Structure Pattern Action … … (a | b)*abb e. g. etc. - Each pattern recognizes lexemes (abc)*ab Recognizer! - Each pattern described by regular expression CH 3. 42

Lex Specification Lexical Analyzer CSE 244 • Let P 1, P 2, … ,

Lex Specification Lexical Analyzer CSE 244 • Let P 1, P 2, … , Pn be Lex patterns (regular expressions for valid tokens in prog. lang. ) • Construct N(P 1), N(P 2), … N(Pn) • Note: accepting state of N(Pi) will be marked by Pi • Construct NFA: N(P 1) N(P 2) • Lex applies conversion algorithm to construct DFA that is equivalent! N(Pn) CH 3. 43

Pictorially CSE 244 Lex Specification Lex Compiler Transition Table (a) Lex Compiler lexeme input

Pictorially CSE 244 Lex Specification Lex Compiler Transition Table (a) Lex Compiler lexeme input buffer FA Simulator Transition Table (b) Schematic lexical analyzer CH 3. 44

Example CSE 244 P 1 : a P 2 : abb P 3 :

Example CSE 244 P 1 : a P 2 : abb P 3 : a*b+ 3 patterns NFA’s : start 1 3 a a P 1 2 4 a start 7 b P 2 b b 8 5 b 6 P 3 CH 3. 45

Example – continued (2) Combined NFA : 1 a 2 P 1 CSE 244

Example – continued (2) Combined NFA : 1 a 2 P 1 CSE 244 start 0 3 a 4 a 7 b b b 8 5 b 6 P 2 P 3 Examples a a b a {0, 1, 3, 7} {2, 4, 7} {8} death pattern matched: P 1 - P 3 a b b {0, 1, 3, 7} {2, 4, 7} {5, 8} {6, 8} pattern matched: P 1 P 3 P 2, P 3 break tie in favor of P 2 CH 3. 46

Example – continued (3) CSE 244 Alternatively Construct DFA: (keep track of correspondence between

Example – continued (3) CSE 244 Alternatively Construct DFA: (keep track of correspondence between patterns and new accepting states) Input Symbol STATE a {0, 1, 3, 7} {2, 4, 7} b Pattern {8} none {2, 4, 7} {5, 8} P 1 {8} - {8} P 3 {7} {8} none {5, 8} - {6, 8} P 3 {6, 8} - {8} P 2 break tie in favor of P 2 CH 3. 47

Minimizing the Number of States of DFA CSE 244 1. Construct initial partition of

Minimizing the Number of States of DFA CSE 244 1. Construct initial partition of S with two groups: accepting/ non-accepting. 2. (Construct new )For each group G of do begin 1. Partition G into subgroups such that two states s, t of G are in the same subgroup iff for all symbols a states s, t have transitions on a to states of the same group of . 2. Replace G in new by the set of all these subgroups. 3. Compare new and . If equal, final: = then proceed to 4, else set : = new and goto 2. 4. Aggregate states belonging in the groups of final CH 3. 48

example a a A CSE 244 a a B F b b D b

example a a A CSE 244 a a B F b b D b a C b b a A, C, D Minimized DFA: B, F a b b CH 3. 49

Other Issues - § 3. 9 – Not Discussed CSE 244 • More advanced

Other Issues - § 3. 9 – Not Discussed CSE 244 • More advanced algorithm construction – regular expression to DFA directly CH 3. 50

Using LEX Lex Program Structure: CSE 244 declarations %% translation rules %% auxiliary procedures

Using LEX Lex Program Structure: CSE 244 declarations %% translation rules %% auxiliary procedures Name the file e. g. test. lex Then, “lex test. lex” produces the file “lex. yy. c” (a C-program) CH 3. 51

declarations Rules Auxiliary CSE 244 C declarations LEX %{ /* definitions of all constants

declarations Rules Auxiliary CSE 244 C declarations LEX %{ /* definitions of all constants LT, LE, EQ, NE, GT, GE, IF, THEN, ELSE, . . . */ %}. . . letter [A-Za-z] digit [0 -9] id {letter}({letter}|{digit})*. . . %% if { return(IF); } then { return(THEN); } {id} { yylval = install_id(); return(ID); }. . . %% install_id() { /* procedure to install the lexeme to the ST */ CH 3. 52

Example of a Lex Program int num_lines = 0, num_chars = 0; %% CSE

Example of a Lex Program int num_lines = 0, num_chars = 0; %% CSE 244 n. {++num_lines; ++num_chars; } {++num_chars; } %% main( argc, argv ) int argc; char **argv; { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); printf( "# of lines = %d, # of chars = %dn", num_lines, num_chars ); } CH 3. 53

Another Example %{ #include <stdio. h> %} WS [ tn]* CSE 244 %% [0123456789]+

Another Example %{ #include <stdio. h> %} WS [ tn]* CSE 244 %% [0123456789]+ [a-z. A-Z][a-z. A-Z 0 -9]* {WS}. %% printf("NUMBERn"); printf("WORDn"); /* do nothing */ printf(“UNKNOWNn“); main( argc, argv ) int argc; char **argv; { ++argv, --argc; if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); } CH 3. 54

Concluding Remarks CSE 244 Focused on Lexical Analysis Process, Including - Regular Expressions -

Concluding Remarks CSE 244 Focused on Lexical Analysis Process, Including - Regular Expressions - Finite Automaton - Conversion - Lex - Interplay among all these various aspects of lexical analysis Looking Ahead: The next step in the compilation process is Parsing: - Top-down vs. Bottom-up -- Relationship to Language Theory CH 3. 55