Programming Languages CS 550 Lecture 4 Summary Scanner
- Slides: 63
Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1
Theme v. We have now seen how to describe syntax using regular expressions and grammars and how to create scanners and parsers, by hand using automated tools. In this lecture we provide more details on parsing and scanning and indicate how these tools work. v. Reading: chapter 2 of the text by Scott. 2
Parser and Scanner Generators v Tools exist (e. g. yacc/bison 1 for C/C++, PLY for python, CUP for Java) to automatically construct a parser from a restricted set of context free grammars (LALR(1) grammars for yacc/bison and the derivatives CUP and PLY) v These tools use table driven bottom up parsing techniques (commonly shift/reduce parsing) v Similar tools (e. g. lex/flex for C/C++, Jflex for Java) exist, based on theory of finite automata, to automatically construct scanners from regular expressions 3 1 bison in the GNU version of yacc
Outline v Scanners and DFA v Regular Expressions and NDFA v Equivalence of DFA and NDFA v Regular Languages and the limitations of regular expressions v Recursive Descent Parsing v LL(1) Grammars and Tob-down (Predictive) Parsing v LR(1) Grammars and Bottom-up Parsing 4
Regular Expressions v Alphabet = v A language over is subset of strings in v Regular expressions describe certain types of languages Ø is a regular expression Ø = { } is a regular expression Ø For each a in , a denoting {a} is a regular expression Ø If r and s are regular expressions denoting languages R and S respectively then (r | s), (rs), and (r*) are regular expressions v E. G. 00, (0|1)*00(0|1)*, 00*11*22*, (1|10)* 5
List Tokens v LPAREN = ‘(‘ v RPAREN = ‘)’ v COMMA = ‘, ’ v NUMBER = DIGIT* v DIGIT = 0|1|2|3|4|5|6|7|8|9 v Unix shorthand: [0 -9], DIGIT+ v Whitespace: (‘ ’ | ‘n’ | ‘t’)* 6
List Scanner TOKEN Get. Token() { int val = 0; if (c = getchar() == eof) then return None end if; while c {‘ ’, ‘n’, ‘t’} then c = getchar() end do; if c {‘(’, ‘, ’, ‘)’} then return c end if; if c {‘ 0’, …, ‘ 9’} then while c {‘ 0’, …, ‘ 9’} do val = val*10 + (c – ‘ 0’); c = getchar(); end do; putchar(c); return (NUMBER, val); else return None; end if; } 7
Flex List Tokens %{ #include "list. tab. h" extern int yylval; %} %% [ tn] ; "(" return yytext[0]; ")" return yytext[0]; ", " return yytext[0]; [0 -9]+ { yylval = atoi(yytext); return NUMBER; } %% 8
Deterministic Finite Automata v. Input comes from alphabet A v. Finite set of states, S, start state, s 0, Accepting States, F v. Transition T from state to state depending on next input ØM = (A, S, s 0, F, T) v. The language accepted by a finite automata is the set of input strings that end up in accepting states
Example 1 v. Create a finite state automata that accepts strings of a’s and b’s with an even number of a’s. b S 1 a > S 0 b a abbbabaabbb 011110010000
DFA Implementation v. Program to implement DFA b S 1 a > S 0 b bool EA() { S 0: x = getchar(); if (x == ‘b’) goto S 0; if (x == ‘a’) goto S 1; if (x == ENDM) return true; a S 1: x = getchar(); if (x == ‘b’) goto S 1; if (x == ‘a’) goto S 0; if (x == ENDM) return false; }
List DFA d S 1 ‘ ’, t, n > S 0 S 4 d ( ( , S 2 S 3 12
Calculator Tokens v ASSIGN = ‘: =‘ v PLUS = ‘+’, MINUS = ‘-’, TIMES = ‘*’, DIV = ‘/’ v LPAREN = ‘(’, RPAREN = ‘)’ v NUMBER = DIGIT* | DIGIT* (. DIGIT|DIGIT. ) DIGIT* v ID = LETTER (LETTER | DIGIT)* v DIGIT = 0|1| … |9, LETTER = a|…|z|A|…|Z v COMMENT = /* (non-* | * non-/)* */ | // (non-newline)* newline v WHITESPACE = (‘ ’ | ‘n’ | ‘t’)* 13
Calculator DFA 14 Copyright © 2009 Elsevier
Table Driven Scanner State 1 ' ', t n / * ( ) + - : 17 17 2 10 6 7 8 9 11 3 4 2 = . digit 13 14 letter other token 16 div 3 3 18 3 3 3 4 4 5 4 4 4 4 4 5 4 4 18 5 4 4 4 4 4 6 lparen 7 rparen 8 plus 9 minus 10 times 11 12 12 assign 13 15 14 15 15 15 16 14 number 16 16 identifier 17 17 17 - - - white-space 18 - - - - comment 15
Non-Deterministic Finite Automata v Same as DFA M = (A, S, s 0, F, T) except ØCan have multiple transitions from same state with same input ØCan have epsilon transitions ØExcept input string if there is a path to an accepting state v. The languages accepted by NDFA are the same as DFA
Example 2 a v. DFA accepting (a|b)*abb b > S 0 a a S 1 b a S 2 b S 0
Example 2 b v. NDFA accepting (a|b)*abb a, b > S 0 a S 1 b S 2 b S 3 a, b > S 0 ε S 1 a S 2 b S 3 b S 4
Simulating an NDFA v Compute S = set of states NDFA could be in after reading each symbol in the input. v Si = set of possible states after reading i input symbols 1. Initialize S 0 = Epsilon. Closure{0} 2. for i = 1, …, len(str) 1. Ti = _{s Si-1} T[s, str[i]] 2. Si = Epsilon. Closure(Ti ) b a b b {0, 1}{0, 1, 2}{0, 1, 3}{0, 1, 4}
NDFA from Regular Expressions v. Base case – c v. Union – R|S v. Concatenation – RS c ε R ε ε S ε R v. Closure – R* ε S ε R ε ε
Example 3 v. Construct a NDFA that accepts the language generated by the regular expression (a|bc) S 1 a S 2 S 6 > S 0 S 3 b S 4 c S 5
Regular Expression Compiler %{ #include "machine. h" char input[100]; %} %union{ Machine. Ptr ndfa; char symbol; } %token <symbol> LETTER %type <ndfa> regexp %type <ndfa> cat %type <ndfa> kleene %% statement: regexp { do { printf("Enter stringn"); if (scanf("%s", input) != EOF) Simulate($1, input); else exit(1); } while (1); } regexp: regexp '|' cat { $$ = Machine. Or($1, $3); } | cat { $$ = $1; } ; cat: cat kleene { $$ = Machine. Concat($1, $2); } | kleene { $$ = $1; } ; kleene: '(' regexp ')' { $$ = $2; } | kleene '*' { $$ = Machine. Star($1); } | LETTER { $$ = Base. Machine($1); }; %%
NDFA v. Find an equivalent DFA for Example 3 ØStates in DFA are sets of states from NDFA [keep track of all possible transitions] > 013 a 26 b 4 c 56
Exercise 1 1. Construct NDFA that recognizes (see pages 55 -57 of text) 1. d*(. d | d. )d* 2. Convert NDFA from (1) to DFA (see pages 56 -58 of text)
Solution 1. 1 1 2 d 3 4 5 . 6 d 7 11 8 d 9 . 10 12 d 13 14 25
Solution 1. 2 A[1, 2, 4, 5, 8] d d B[2, 3, 4, 5, 8, 9]. . D[6, 10, 11, 12, 14] C[6] d d E[7, 11, 12, 14] F[7, 11, 12, 13, 14] d d G[11, 12, 13, 14] d 26
Minimizing States in DFA v. The exists a unique minimal state DFA for any language described by a regular expression v. Combine equivalent states Øp q if for each input string x T(p, x) is an accepting state iff T(q, x) is an accepting state ØInitialize two sets of states: accepting and nonaccepting ØPartition state sets which transition into multiple sets of states
Exercise 2 1. Find an equivalent DFA to the one in Exercise 1 that minimizes the number of states (see page 59 of text) d, . ABC d, . DEFG d Ambiguity: T(A, d) = T(B, d) T(C, d) split ABC AB, C
Solution 2 d A d . B. C d DEFG d
Equivalence of Regular Expressions and DFA v The languages accepted by finite automata are equivalent to those generated by regular expressions Ø Given any regular expression R, there exists a finite state automata M such that L(M) = L(R) § Proof is given by previous construction Ø Given any finite state automata M, there exists a regular expression R such that L(R) = L(M) § The basic idea is to combine the transitions in each node along all paths that lead to an accepting state. The combination of the characters along the paths are described using regular expressions.
Example 4 v. Create a regular expression for the language that consists of strings of a’s and b’s with an even number of a’s. b S 1 a > S 0 b a b*|(b*ab*a)*
Grammars and Regular Expressions v. Given a regular expression R, there exists a grammar with syntactic category <S> such that L(R) = L(<S>). v. There are grammars such that there does NOT exist a regular expression R with L(<S>) = L(R) Ø <S> a<S>b| Ø L(<S>) = {anbn, n=0, 1, 2, …}
Example 5 v. Create a grammar that generates the language that consists of strings of a’s and b’s with an even number of a’s. b S 1 a > S 0 b a <S 0> b<S 0> a<S 1> <S 0> <S 1> b<S 1> a<S 0>
n n ab Proof that is not Recognized by a Finite State Automata v To show that there is no finite state automata that recognizes the language L = {anbn, n = 0, 1, 2, …}, we assume that there is a finite state automata M that recognizes L and show that this leads to a contradiction. v Since M is a finite state automata it has a finite number of states. Let the number of states = m. v Since M recognizes the language L all strings of the form akbk must end up in accepting states. Choose such a string with k = n which is greater than m.
n n ab Proof that is not Recognized by a Finite State Automata v Since n > m there must be a state s that is visited twice while the string an is read [we can only visit m distinct states and since n > m after reading (m+1) a’s, we must go to a state that was already visited]. v Suppose that state s is reached after reading the strings aj and ak (j k). Since the same state is reached for both strings, the finite state machine can not distinguish strings that begin with aj from strings that begin with ak. v Therefore, the finite state automata must either accept or reject both of the strings ajbj and akbj. However, ajbj should be accepted, while akbj should not be accepted.
List Grammar v < list > → ( < sequence > ) | ( ) v < sequence > → < listelement > , < sequence > | < listelement > v < listelement > → < list > | NUMBER 36
Recursive Descent Parser list() { match(‘(‘); if token ‘)’ then seq(); endif; match(‘)’); } 37
Recursive Descent Parser seq() { elt(); if token = ‘, ’ then match(‘, ’); seq(); endif; } 38
Recursive Descent Parser elt() { if token = ‘(‘ then list(); else match(NUMBER); endif; } 39
Yacc (bison) Example %token NUMBER /* needed to communicate with scanner */ %% list: '(' sequence ')' { printf("L -> ( seq )n"); } | '(' ')' { printf("L -> () n "); } sequence: listelement ', ' sequence { printf("seq -> LE, seqn"); } | listelement { printf("seq -> LEn"); } ; listelement: NUMBER { printf("LE -> %dn", $1); } | list { printf("LE -> Ln"); } ; %% /* since no code here, default main constructed that simply calls parser. */ 40
Top-down vs. Bottom-up Parsing 41 Copyright © 2009 Elsevier
Bottom-up Parsing LR Grammar 42 Copyright © 2009 Elsevier
LR Calculator Grammar Figure 2. 24 Program stmt_list stmt expr term factor add op mult op → → | | → | → | | → → stmt list $$ stmt_list stmt id : = expr read id write expr term expr add op term factor term mult_op factor ( expr ) id number + | * | / 43
LL Calculator Grammar Here is an LL(1) grammar (Fig 2. 15): 1. 2. 3. 4. 5. 6. 7. 8. 9. program stmt_list → stmt_list $$ → stmt_list | ε stmt → id : = expr | read id | write expr → term_tail → add op term_tail | ε 44
LL Calculator Grammar LL(1) grammar (continued) 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. term → factor fact_tailt fact_tail → mult_op fact_tail | ε factor → ( expr ) | id | number add_op → + | mult_op → * | / 45
Predictive Parser v. Predict which rules to match ØA § when next token can start § * and the next token can follow A ØPREDICT(A ) = FIRST( ) FOLLOW(A) if EPS( ) § PREDICT(program → stmt_list $$) § FIRST(stmt_list) = {id, read, write} § FOLLOW(stmt_list) = {$$} ØIntersection of PREDICT sets for same lhs must 46 be empty
Recursive Descent Parser procedure program case input_token of id, read, write, $$: stmt_list; match($$) otherwise error procedure stmt_list case input_token of id, read, write: stmt; stmt_list $$: skip otherwise error procedure stmt case input_token of id: match(id); match(: =); expr read: match(read); match(id) write: match(write); expr otherwise error procedure expr case input_token of id, number, (: term; termtail otherwise error procedure term_tail case input_token of +, -: add_op; term_tail ), id, read, write, $$: skip otherwise error procedure term case input_token of id, number, (: factor; factor_tail otherwise error 47
Exercise 3 Trace through the recursive descent parser and build parse tree for the following program 1. 2. 3. 4. 5. read A read B sum : = A + B write sum / 2
Solution 3 Copyright © 2009 Elsevier 49
Table-Driven LL Parser Copyright © 2009 Elsevier 50
Table-Driven LL Parser Parse Stack Input Stream Comment program stmt_list $$ read id stmt_list $$ stmt_list $$ read A read B … A read B … Initial stack contents program stmt_list $$ stmt_list stmt read id match(read) match(id) stmt_list … stmt_list $$ $$ … … term_tail ε stmt_list ε 51
Computing First, Follow, Predict v. Algorithm First/Follow/Predict: ØFIRST(α) ={c : α →* c β} ØFOLLOW(A) = {c : S →+ α A c β} ØEPS(α) = if α →* ε then true else false ØPredict (A → α) = FIRST(α) ∪ (if EPS(α) then FOLLOW(A) else ) 52
Predict Set for LL Parser 53 Copyright © 2009 Elsevier
LR Parsing v. Bottom up (rightmost derivation) ØMaintain forrest of partially completed subtrees of the parse tree ØJoin trees together when recognizing symbols in rhs of production ØKeep roots of partially completed trees on stack § Shift when new token § Reduce when top symbols match rhs ØTable driven 54
Top-down vs. Bottom-up Parsing 55 Copyright © 2009 Elsevier
LR Parsing Example Stack id(A) , id(A), id(B), id(C); id(A), id(B), id(C) id_list_tail id(A), id(B) id_list_tail id(A) id_list_tail id_list Remaining Input A, B, C; C; ; 56
LR Calculator Grammar (Figure 2. 24, Page 73): 1. program 2. stmt_list 3. → stmt list $$ → stmt_list stmt | stmt 4. stmt 5. 6. → 7. expr 8. → id : = expr | read id | write expr term | expr add op term 57
LR Calculator Grammar LR grammar (continued): 9. term 10. 11. factor 12. 13. 14. add op 15. 16. mult op 17. → | | → | factor term mult_op factor ( expr ) id number + * / 58
LR Parser State v. Keep track of set of productions we might be in along with where in those productions we might be v. Initial state for calculator grammar program stmt_list $$ stmt_list stmt id : = expr stmt read id stmt write expr // basis // yield 59
LR Parser States Copyright © 2009 Elsevier 60
Characteristic Finite State Machine Copyright © 2009 Elsevier 61
LR Parser Table Copyright © 2009 Elsevier 62
LR Parsing Example Copyright © 2009 Elsevier 63
- Scanner keyboard = new scanner(system.in);
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Real-time systems and programming languages
- Cs 421
- Thread dalam java
- Cxc it
- Introduction to programming languages
- Plc programming languages
- Procedural programming languages
- Comparative programming languages
- Alternative programming languages
- Strongly typed vs weakly typed
- Transmission programming languages
- Cse 340 principles of programming languages
- Integral data type
- Xenia programming languages
- Advantages of high level language
- Mainstream programming languages
- Vineeth kashyap
- Programming languages
- Programming languages
- Programming languages
- Programming languages
- Language
- Brief history of programming languages
- Taxonomy of programming languages
- Real-time systems and programming languages
- Programming xkcd
- If programming languages were cars
- Reasons for studying concepts of programming languages
- Cornell programming languages
- Low level linux programming
- Middle level programming languages
- The art of programming
- Cs 421 programming languages and compilers
- Multimedia programming languages
- Storage management in programming languages
- C programming lecture
- Summary scanner
- Perbedaan linear programming dan integer programming
- Greedy vs dynamic
- Windows 10 system programming, part 1
- Integer programming vs linear programming
- Definisi integer
- 603 550 israelites
- Byzantine empire 550 ad
- 550 poirier st coquitlam bc
- Akta 550
- Abb dcs 550
- Inverse of 550 in gf(1759)
- 550 yi en yakın yüzlüğe yuvarlama
- Toeic 550
- 550-444
- 624 en numeros romanos
- Unc comp 550
- Ach 550
- 5+5+5=550
- Ahri 550/590
- 320+550
- Abb
- 550+330
- Telit sat 550
- Mgs550
- Ie 550