Compiler I Syntax Analysis Building a Modern Computer
Compiler I: Syntax Analysis Building a Modern Computer From First Principles www. nand 2 tetris. org Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 1
Course map Human Thought Abstract design Chapters 9, 12 Software hierarchy abstract interface H. L. Language & Operating Sys. Compiler Chapters 10 - 11 abstract interface Virtual Machine VM Translator abstract interface Chapters 7 - 8 Assembly Language Assembler Chapter 6 abstract interface Machine Language Computer Architecture abstract interface Chapters 4 - 5 Hardware Platform Hardware hierarchy Gate Logic abstract interface Chapters 1 - 3 Chips & Logic Gates Electrical Engineering Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 2 Physics
Motivation: Why study about compilers? The first compiler is FORTRAN compiler developed by an IBM team led by John Backus (Turing Award, 1977) in 1957. It took 18 man-month. Because Compilers … n Are an essential part of applied computer science n Are very relevant to computational linguistics n Are implemented using classical programming techniques n Employ important software engineering principles n Train you in developing software for transforming one structure to another (programs, files, transactions, …) n Train you to think in terms of ”description languages”. n Parsing files of some complex syntax is very common in many applications. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 3
The big picture Modern compilers are two-tiered: n Front-end: from high-level language to some intermediate language n Back-end: from the intermediate language to binary code. CISC machine language Some language . . . Jack language Compiler lectures Some compiler Jack compiler Some Other compiler (Projects 10, 11) Intermediate code VM implementation over CISC platforms VM imp. over RISC platforms RISC machine language RISC machine VM lectures VM imp. over the Hack platform VM emulator written in a high-level language . . . CISC machine . . . Some Other language (Projects 7 -8) Hack machine language . . . other digital platforms, each equipped with its VM implementation HW lectures (Projects 1 -6) Any computer Hack computer Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 4
Compiler architecture (front end) (source) scanner (target) n Syntax analysis: understanding the structure of the source code q Tokenizing: creating a stream of “atoms” q Parsing: matching the atom stream with the language grammar XML output = one way to demonstrate that the syntax analyzer works n Code generation: reconstructing the semantics using the syntax of the target code. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 5
Tokenizing / Lexical analysis / scanning n Remove white space n Construct a token list (language atoms) n Things to worry about: l l Language specific rules: e. g. how to treat “++” Language-specific classifications: keyword, symbol, identifier, integer. Cconstant, string. Constant, . . . n While we are at it, we can have the tokenizer record not only the token, but also its lexical classification (as defined by the source language grammar). Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 6
C function to split a string into tokens n char* strtok(char* str, const char* delimiters); l str: string to be broken into tokens l delimiters: string containing the delimiter characters Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 7
Jack Tokenizer if (x < 153) {let city = ”Paris”; } Source code Tokenizer’s output Tokenizer <tokens> <keyword> if </keyword> <symbol> ( </symbol> <identifier> x </identifier> <symbol> < </symbol> <integer. Constant> 153 </integer. Constant> <symbol> ) </symbol> <symbol> { </symbol> <keyword> let </keyword> <identifier> city </identifier> <symbol> = </symbol> <string. Constant> Paris </string. Constant> <symbol> ; </symbol> <symbol> } </symbol> </tokens> Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 8
Parsing n The tokenizer discussed thus far is part of a larger program called parser n Each language is characterized by a grammar. The parser is implemented to recognize this grammar in given texts n The parsing process: l l l A text is given and tokenized The parser determines weather or not the text can be generated from the grammar In the process, the parser performs a complete structural analysis of the text n The text can be in an expression in a : l Natural language (English, …) l Programming language (Jack, …). Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 9
Parsing examples English Jack He ate an apple on the desk. (5+3)*2 – sqrt(9*4) parse ate an apple he on the desk Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 10
Regular expressions n a|b* { , “a”, “bb”, “bbb”, …} n (a|b)* { , “a”, “b”, “aa”, “ab”, “ba”, “bb”, “aaa”, …} n ab*(c| ) {a, “ac”, “abc”, “abbc”, …} Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 11
Lex n A computer program that generates lexical analyzers (scanners or lexers) n Commonly used with the yacc parser generator. n Structure of a Lex file Definition section %% Rules section %% C code section Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 12
Example of a Lex file /*** Definition section ***/ %{ /* C code to be copied verbatim */ #include <stdio. h> %} /* This tells flex to read only one input file */ %option noyywrap /*** Rules section ***/ %% [0 -9]+ { /* yytext is a string containing the matched text. */ printf("Saw an integer: %sn", yytext); }. |n { /* Ignore all other characters. */ } Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 13
Example of a Lex file %% /*** C Code section ***/ int main(void) { /* Call the lexer, then quit. */ yylex(); return 0; } Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 14
Example of a Lex file > flex test. lex (a file lex. yy. c with 1, 763 lines is generated) > gcc lex. yy. c (an executable file a. out is generated) >. /a. out < test. txt Saw an integer: 123 Saw an integer: 2 Saw an integer: 6 test. txt abc 123 z. !&*2 gj 6 Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 15
Another Lex example %{ int num_lines = 0, num_chars = 0; %} %option noyywrap %% n ++num_lines; ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %dn", num_lines, num_chars ); } Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 16
A more complex Lex example %{ /* need this for the call to atof() below */ #include <math. h> %} %option noyywrap DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ { printf( "An integer: %s (%d)n", yytext, atoi( yytext ) ); } {DIGIT}+". "{DIGIT}* { printf( "A float: %s (%g)n", yytext, atof( yytext ) ); } Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 17
A more complex Lex example if|then|begin|end|procedure|function { printf( "A keyword: %sn", yytext ); } {ID} printf( "An identifier: %sn", yytext ); "+"|"-"|"="|"("|")" printf( “Symbol: %sn", yytext ); [ tn]+ /* eat up whitespace */. printf("Unrecognized char: %sn", yytext ); %% void main(int argc, char **argv ) { if ( argc > 1 ) yyin = fopen( argv[1], "r" ); else yyin = stdin; yylex(); } Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 18
A more complex Lex example pascal. txt if (a+b) then foo=3. 1416 else foo=12 output A keyword: if Symbol: ( An identifier: a Symbol: + An identifier: b Symbol: ) A keyword: then An identifier: foo Symbol: = A float: 3. 1416 (3. 1416) An identifier: else An identifier: foo Symbol: = An integer: 12 (12) Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 19
Context-free grammar n Terminals: 0, 1, # n Non-terminals: A, B n Start symbol: A n Rules: l A 0 A 1 l A B l B # n Simple (terminal) forms / complex (non-terminal) forms n Grammar = set of rules on how to construct complex forms from simpler forms n Highly recursive. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 20
Examples of context-free grammar n S () S (S) S SS n S a|a. S|b. S strings ending with ‘a’ n S x S y S S+S S S-S S S*S S S/S S (S) (x+y)*x-x*y/(x+x) Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 21
Examples of context-free grammar n non-terminals: S, E, Elist n terminals: ID, NUM, PRINT, +, : =, (, ), ; n rules: S S; S E ID Elist E S ID : = E E NUM Elist , E S PRINT ( Elist ) E E+E E ( S , Elist ) Try to derive: ID = NUM ; PRINT ( NUM ) slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 22
Examples of context-free grammar n non-terminals: S, E, Elist n terminals: ID, NUM, PRINT, +, : =, (, ), ; n rules: S S; S Elist E E ID S ID : = E Elist , E E NUM S PRINT ( Elist ) E E+E E ( S , Elist ) left-most derivation right-most derivation S S S; S ID = E ; S S ; PRINT ( Elist ) ID = NUM ; S S ; PRINT ( E ) ID = NUM ; PRINT ( Elist ) S ; PRINT ( NUM ) ID = NUM ; PRINT ( E ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 23
Parse tree n Two derivations, but 1 tree S S; S ID = E ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) S S; S S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) S S ID : = ; E S PRINT ( L ) E NUM slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 24
Ambiguous Grammars n a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse trees n non-terminals: n rules: E ID, NUM, PLUS, MUL E ID E NUM E E+E E E*E characters: 4 + 5 * 6 tokens: NUM(4) PLUS NUM(5) MUL NUM(6) slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 25
Ambiguous Grammars characters: 4 + 5 * 6 tokens: NUM(4) PLUS NUM(5) MUL NUM(6) E E E ID E NUM E E+E E E*E + E E NUM(4) * E NUM(6) NUM(5) E E E NUM(4) + * E NUM(5) E NUM(6) slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 26
Ambiguous Grammars n problem: compilers use parse trees to interpret the meaning of parsed expressions l different parse trees have different meanings l eg: (4 + 5) * 6 is not 4 + (5 * 6) l languages with ambiguous grammars are DISASTROUS; The meaning of programs isn’t welldefined! You can’t tell what your program might do! n solution: rewrite grammar to eliminate ambiguity l fold precedence rules into grammar to disambiguate l fold associativity rules into grammar to disambiguate l other tricks as well slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 27
Recursive descent parser n Recursive Descent Parsing l aka: predictive parsing; top-down parsing l simple, efficient l can be coded by hand in ML quickly l l parses many, but not all CFGs q parses LL(1) grammars n. Left-to-right parse; Leftmost-derivation; 1 symbol lookahead key ideas: q one recursive function for each non terminal q each production becomes one clause in the function slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 28
Recursive descent parser n Non-terminals: S, E, L n Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, =, ; n Rules: 1. S -> IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. 5. 6. L -> END |; SL E -> NUM = NUM slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 29
Recursive descent parser n Non-terminals: S, E, L n Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, =, ; S() { switch (next()) { case IF: n Rules: 1. S -> IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. 5. 6. eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break; case BEGIN: eat(BEGIN); S(); L(); L -> END break; |; SL case PRINT: eat(PRINT); E(); E -> NUM = NUM break; } } slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 30
Recursive descent parser n Non-terminals: S, E, L n Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, EQ(=), SEMI(; ) L() { switch (next()) { case END: eat(END); n Rules: 1. S -> IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. 5. 6. break; case SEMI: eat(SEMI); S(); L(); break; default: L -> END error(); |; SL E -> NUM = NUM } } slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 31
Recursive descent parser n Non-terminals: S, E, L n Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, EQ(=), SEMI(; ) S -> IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. 5. 6. { eat(NUM); eat(EQ); eat(NUM); n Rules: 1. E() } L -> END |; SL E -> NUM = NUM slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 32
Recursive descent parser n Non-terminals: S, A, E, L n Terminals: EOF, ID, NUM, ASSIGN(: =), PRINT, LPAREN((), RPAREN()) n Rules: 1. S -> A EOF 2. A -> ID : = E 3. 4. 5. 6. 7. | PRINT(L) E -> ID | NUM L -> E | L, E slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 33
Recursive descent parser n Non-terminals: S, A, E, L n Terminals: EOF, ID, NUM, ASSIGN(: =), PRINT, LPAREN((), RPAREN()) 1. S -> A EOF 2. A -> ID : = E 4. 5. 6. 7. { A(); eat(EOF); } n Rules: 3. S() | PRINT(L) E -> ID | NUM L -> E | L, E slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 34
Recursive descent parser n Non-terminals: S, A, E, L n Terminals: EOF, ID, NUM, ASSIGN(: =), PRINT, LPAREN((), RPAREN()) A() { switch (next()) { case ID: eat(ID); eat(ASSIGN); n Rules: 1. S -> A EOF 2. A -> ID : = E 3. 4. 5. 6. 7. E(); break; case PRINT: eat(PRINT); eat(LPAREN); | PRINT(L) L(); eat(RPAREN); E -> ID break; | NUM L -> E } } | L, E slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 35
Recursive descent parser n Non-terminals: S, A, E, L n Terminals: EOF, ID, NUM, ASSIGN(: =), PRINT, LPAREN((), RPAREN()) E() { switch (next()) { case ID: eat(ID); n Rules: 1. S -> A EOF 2. A -> ID : = E 3. 4. 5. 6. 7. break; case NUM: eat(NUM); break; | PRINT(L) E -> ID | NUM } } L -> E | L, E slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 36
Recursive descent parser n Non-terminals: S, A, E, L n Terminals: EOF, ID, NUM, ASSIGN(: =), PRINT, LPAREN((), RPAREN()) L() { switch (next()) { case ID: ? ? ? n Rules: 1. S -> A EOF 2. A -> ID : = E 3. 4. 5. 6. 7. | PRINT(L) E -> ID | NUM case NUM: ? ? ? } Problem: } E could be ID L could be E could be ID L -> E | L, E Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 37
Recursive descent parser n Non-terminals: S, A, E, L n Terminals: EOF, ID, NUM, ASSIGN(: =), PRINT, LPAREN((), RPAREN()) n Rules: 1. S -> A EOF 2. A -> ID : = E 3. 4. 5. 6. 7. | PRINT(L) E -> ID | NUM L -> E | L, E Problem: E could be ID L could be E could be ID L -> E M M -> , E M | slide credit: David Walker Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 38
A typical grammar of a typical C-like language Code samples while (expression) { if (expression) statement; while (expression) { statement; if (expression) statement; } while (expression) { statement; } } if (expression) { statement; while (expression) statement; } if (expression) statement; } Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 39
A typical grammar of a typical C-like language program: statement; statement: while. Statement | if. Statement | // other statement possibilities. . . | '{' statement. Sequence '}' while. Statement: 'while' '(' expression ')' statement if. Statement: simple. If | if. Else simple. If: 'if' '(' expression ')' statement if. Else: 'if' '(' expression ')' statement 'else' statement. Sequence: '' // null, i. e. the empty sequence | statement '; ' statement. Sequence expression: // definition of an expression comes here Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 40
Parse tree program: statement; statement: while. Statement | if. Statement | // other statement possibilities. . . | '{' statement. Sequence '}' while. Statement: 'while' '(' expression ')' statement. . . Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 41
Recursive descent parsing code sample while (expression) { statement; while (expression) { while (expression) statement; } } Parser implementation: a set of parsing methods, one for each rule: n LL(0) grammars: the first token n parse. Statement() determines in which rule we are n parse. While. Statement() n In other grammars you have to n parse. If. Statement() look ahead 1 or more tokens n parse. Statement. Sequence() n Jack is almost LL(0). n parse. Expression(). n Highly recursive Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 42
The Jack grammar ’x’: x appears verbatim x: x is a language construct x? : x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x, y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 43
The Jack grammar ’x’: x appears verbatim x: x is a language construct x? : x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x, y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 44
The Jack grammar ’x’: x appears verbatim x: x is a language construct x? : x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x, y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 45
The Jack grammar ’x’: x appears verbatim x: x is a language construct x? : x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x, y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 46
Jack syntax analyzer in action <var. Dec> Class Bar { <keyword> var </keyword> method Fraction foo(int y) { <keyword> int </keyword> var int temp; // a variable let temp = (xxx+12)*-63; <identifier> temp </identifier> <symbol> ; </symbol> . . . </var. Dec> . . . <statements> Syntax analyzer <let. Statement> <keyword> let </keyword> Syntax analyzer <identifier> temp </identifier> n With the grammar, <symbol> = </symbol> <expression> we can write a syntax analyzer program (parser) <term> <symbol> ( </symbol> n The syntax analyzer takes <expression> <term> a source text file and <identifier> xxx </identifier> attempts to match it on </term> the language grammar <symbol> + </symbol> <term> n If successful, it can <int. Const. > 12 </int. Const. > generate a parse tree in </term> some structured format, </expression> e. g. XML. . Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 47
Jack syntax analyzer in action <var. Dec> Class Bar { <keyword> var </keyword> method Fraction foo(int y) { <keyword> int </keyword> var int temp; // a variable let temp = (xxx+12)*-63; <identifier> temp </identifier> <symbol> ; </symbol> . . . </var. Dec> . . . <statements> Syntax analyzer <let. Statement> <keyword> let </keyword> n If xxx is non-terminal, <identifier> temp </identifier> output: <symbol> = </symbol> <expression> <xxx> <term> Recursive code for <symbol> ( </symbol> the body of xxx <expression> </xxx> <term> <identifier> xxx </identifier> n If xxx is terminal </term> (keyword, symbol, constant, <symbol> + </symbol> or identifier) , output: <term> <int. Const. > 12 </int. Const. > <xxx> </term> xxx value </expression> </xxx> . . . Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 48
The Jack grammar ’x’: x appears verbatim x: x is a language construct x? : x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x, y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 49
Recursive descent parser (simplified expression) n EXP TERM (OP TERM)* n TERM integer | variable n OP + | - | * | / Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 50
From parsing to code generation n EXP TERM (OP TERM)* n TERM integer | variable n OP + | - | * | / EXP() : TERM(); while (next()==OP) OP(); TERM(); Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 51
From parsing to code generation n EXP TERM (OP TERM)* n TERM integer | variable n OP + | - | * | / EXP() : TERM(); while (next()==OP) OP(); TERM(); TERM(): switch (next()) case INT: eat(INT); case VAR: eat(VAR); Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 52
From parsing to code generation n EXP TERM (OP TERM)* n TERM integer | variable n OP + | - | * | / OP(): EXP() : TERM(); while (next()==OP) OP(); TERM(); switch (next()) case +: eat(ADD); case -: eat(SUB); case *: eat(MUL); case /: eat(DIV); TERM(): switch (next()) case INT: eat(INT); case VAR: eat(VAR); Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 53
From parsing to code generation n EXP TERM (OP TERM)* n TERM integer | variable n OP + | - | * | / OP(): EXP() : TERM(); while (next()==OP) OP(); TERM(); switch (next()) case +: eat(ADD); case -: eat(SUB); case *: eat(MUL); case /: eat(DIV); TERM(): switch (next()) case INT: eat(INT); case VAR: eat(VAR); Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 54
From parsing to code generation n EXP TERM (OP TERM)* n TERM integer | variable n OP + | - | * | / OP(): print(‘<op>’); switch (next()) EXP() : print(‘<exp>’); TERM(); while (next()==OP) OP(); TERM(); print(‘</exp>’); case +: eat(ADD); print(‘<sym> + </sym>’); TERM(): print(‘<term>’); switch (next()) case -: eat(SUB); print(‘<sym> - </sym>’); case *: eat(MUL); print(‘<sym> * </sym>’); case /: eat(DIV); print(‘<sym> / </sym>’); print(‘</op>’); case INT: print(‘<int> next() </int>’); eat(INT); case VAR: print(‘<id> next() </id>’); eat(VAR); print(‘</term>’); Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 55
Summary and next step n Syntax analysis: understanding syntax n Code generation: constructing semantics The code generation challenge: n Extend the syntax analyzer into a full-blown compiler that, instead of passive XML code, generates executable VM code n Two challenges: (a) handling data, and (b) handling commands. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 56
Perspective n The parse tree can be constructed on the fly n The Jack language is intentionally simple: l Statement prefixes: let, do, . . . l No operator priority l No error checking l Basic data types, etc. n The Jack compiler: designed to illustrate the key ideas that underlie modern compilers, leaving advanced features to more advanced courses n Richer languages require more powerful compilers Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 57
Perspective n Syntax analyzers can be built using: l Lex tool for tokenizing (flex) l Yacc tool for parsing (bison) l Do everything from scratch (our approach. . . ) n Industrial-strength compilers: (LLVM) l Have good error diagnostics l Generate tight and efficient code l Support parallel (multi-core) processors. Elements of Computing Systems, Nisan & Schocken, MIT Press, www. nand 2 tetris. org , Chapter 10: Compiler I: Syntax Analysis slide 58
- Slides: 58