Automating Scanner Construction RE NFA Thompsons construction Build
Automating Scanner Construction RE NFA (Thompson’s construction) • Build an NFA for each term • Combine them with -moves NFA DFA (subset construction) The Cycle of Constructions • Build the simulation DFA Minimal DFA (today) RE • Hopcroft’s algorithm NFA DFA minimal DFA RE • All pairs, all paths problem • Union together paths from s 0 to a final state from Cooper & Torczon 1
DFA Minimization The Big Picture • Discover sets of equivalent states • Represent each such set with just one state Two states are equivalent if and only if: • The set of paths leading to them are equivalent • , transitions on lead to equivalent states • transitions to distinct sets states must be in distinct sets (DFA) A partition P of S • Each s S is in exactly one set pi P • The algorithm iteratively partitions the DFA’s states from Cooper & Torczon 2
DFA Minimization Details of the algorithm • Group states into maximal size sets, optimistically • Iteratively subdivide those sets, as needed • States that remain grouped together are equivalent Initial partition, P 0 , has two sets {F} & {Q-F} (D =(Q, , , q 0, F)) Splitting a set • • Assume qa, qb, & qc s, and (qa, a) = qx, (qb, a) = qy, & (qa, a) = qz If qx, qy, & qz are not in the same set, then s must be split One state in the final DFA cannot have two transitions on a from Cooper & Torczon 3
DFA Minimization The algorithm P { F, {Q-F}} while ( P is still changing) T {} for each set s P for each partition s by into s 1, s 2, …, sk T T s 1, s 2, …, sk if T P then P T This is a fixed-point algorithm! from Cooper & Torczon Why does this work? • Partition P 2 Q • Start off with 2 subsets of Q {F} and {Q-F} • While loop takes Pi Pi+1 by splitting 1 or more sets • Pi+1 is at least one step closer to the partition with |Q | sets • Maximum of |Q | splits Note that • Partitions are never combined • Initial partition ensures that final states are intact 4
DFA Minimization Enough theory, does this stuff work? Recall our example: ( a | b)* abb > final state a a s 0 a b s 1 b a s 3 a b s 4 b b s 0 , s 2 a a a s 1 b s 3 a b s 4 b s 2 b from Cooper & Torczon 5
DFA Minimization What about a ( b | c )* ? q 0 a q 1 q 2 q 4 b q 5 q 3 q 8 q 6 c q 7 q 9 First, the subset construction: b s 2 b s 0 a s 1 b c c s 3 c Final states from Cooper & Torczon 6
DFA Minimization Then, apply the minimization algorithm b s 2 b s 0 a s 1 b c To produce the minimal DFA b|c s 0 a from Cooper & Torczon s 1 final states c s 3 c In lecture 6, I said that a human would design a simpler automaton than Thompson’s construction did. The algorithms produce that same DFA! 7
Limits of Regular Languages Advantages of Regular Expressions • Simple & powerful notation for specifying patterns • Automatic construction of fast recognizers • Many kinds of syntax can be specified with REs Example — an expression grammar Term [a-z. A-Z] ([a-z. A-z] | [0 -9])* Op +|-| |/ Expr ( Term Op )* Term Of course, this would generate a DFA … If REs are so useful … Why not use them for everything? from Cooper & Torczon 8
Limits of Regular Languages Not all languages are regular RL’s CFL’s CSL’s You cannot construct DFA’s to recognize these languages • L = { pk qk } (parenthesis languages) • L = { wcw r | w *} Neither of these is a regular language (nor an RE) But, this is a little subtle. You can construct DFA’s for • Alternating 0’s and 1’s ( | 1)( 0 | 1) ( 0 | ) • Sets of pairs of 0’s and 1’s ( 01 | 10 )* RE’s can count bounded sets and bounded differences from Cooper & Torczon 9
What can be so hard? Poor language design can complicate scanning • Reserved words are important if then = else; else = then • Significant blanks (PL/I) (Fortran & Algol 68) do 10 i = 1, 25 do 10 i = 1. 25 • String constants with special characters (C, others) newline, tab, quote, comment delimiters, … • Finite closures > Limited identifier length > Adds states to count length from Cooper & Torczon 10
What can be so hard? (Fortran 66/77) How does a compiler do this? • First pass finds & inserts blanks • Can add extra words or tags to create a scanable language • Second pass is normal scanner Example due to Dr. F. K. Zadeck from Cooper & Torczon 11
- Slides: 11