Abstract Syntax Mooly Sagiv Schrierber 317 03 640
Abstract Syntax Mooly Sagiv Schrierber 317 03 -640 -7606 Wed 10: 00 -12: 00 html: //www. cs. tau. ac. il/~msagiv/courses/wcc 02. html
Outline • The general idea • Bison • Motivating example Interpreter for arithmetic expressions • The need for abstract syntax • Abstract syntax for Straight-line code • Abstract syntax for Tiger (Targil)
Semantic Analysis during Recursive Descent Parsing • Scanner returns “semantic values” for some tokens • The function of every non-terminal returns the “corresponding subtree value” • When A : : = B C D is applied the function for A can use the values returned by B, C, and D – The function can also pass parameters, e. g. , to D(), reflecting left contexts
int E() { swith(tok) { case num : temp=tok. val; eat(num); return EP(temp); E : : = num E’ E’ : : =empty-string E’ : = + num E’ default: error(…); }} int EP(int left) { swith(tok) { case $: return left; case + : eat(+); temp=tok. val; eat(num); return EP(left + temp); default: error(…) ; }}
Semantic Analysis during Bottom-Up Parsing • Scanner returns “semantic values” for some tokens • Use parser stack to store the “corresponding subtree values” • When A : : = B C D is reduced the function for A can use the values returned by B, C, and D • No action in the middle of the rule
Example E : : = E + num 5 E: : = num + E 7 E 12
Bison Specification Declarations %% Productions %% C -Routines
Interpreter (in Bison) %{ declarations of yylex() and yyeror() %} %union { int num; string id; } %token <id> ID %token <num> NUM %type <num> e f t %start e %% e : e ‘+’ t {$$ = $1 + $3 ; } | e ‘-’ t { $$ = $1 - $3 ; } |t { $$ = $1; } ; t : t ‘*’ f { $$ = $1 * $3; } | t ‘/’ f { $$= $1 / $3; } | f { $$ = $1; } ; f : NUM { $$ = $1; } | ID { $$ = lookup($1); } | ‘-’ e { $$ = - $2; } | ‘(‘ e ‘)’ { $$ = $2; } ;
Interpreter (compact spec. ) %{ declarations of yylex() and yyeror() %} %union { e : e PLUS e {$$ = $1 + $3 ; } int num; | e MINUS e { $$ = $1 - $3 ; } string id; } | e MUL e { $$ = $1 * $3; } %token <id> ID | e DIV e { $$= $1 / $3; } %token <num> NUM | NUM %type <num> e | ID { $$ = lookup($1); } %start e | MINUS e % prec UMINUS { $$ = - $2; } %left PLUS MINUS | ‘(‘ e ‘)’ { $$ = $2; } %left MUL DIV ; %right UMINUS %%
stack input 7+11+17$ $ num 7 +11+17$ action shift reduce e : : = num $ e 7 +11+17$ shift $ + e 7 $ num 11 + e 7 $ +17$ reduce e: : = num
stack e 11 input +17$ action reduce e : =: e+e + e 7 $ e 18 +17$ shift $ + e 18 $ num 17 + e 18 $ $ reduce e: : = num
stack e 17 input $ action reduce e: : = e+e + e 18 $ e 35 $ $ accept
So why can’t we write all the compiler code in Bison?
%{ prog : stm typdef struct table *Table_ ; typedef Table_ struct {string id, int value, Table _tail} ; Table_ Table(string id, int value, struct table *tail); stm: stm SEMICOLUMN stm Table_ table=NULL; | ID ASSIGN exp int lookup(Table_ table, string id) { {update(&table, $1, $3); } assert(table!=NULL) | PRINT LPAREN exps RPAREN if (id==table. id) return table. value; {printf(“n”); } else return lookup(table. tail, id) ; } exps : exp void update(Table_ *tabptr, string id, int value) { {printf(“%d”, $1) ; } *tabptr = Table(id, value, *tabptr); } | exps COMMA exp %} {printf(“%d”, $3) %union {int num; string id; } ; %token <num> INT exp : INT {$$=$1; } %token <id> ID | ID {$$=lookup(table, $1); } %token ASSIGN PRINT LPAREN RPAREN | exp PLUS exp { $$ = $1 + $3; } %type <num> exp | exp MINUS exp { $$= $1 - $3; } %left SEMICOLUMN COMMA | exp TIMES exp { $$ = $1 * $3; } %left PLUS MINUS | exp DIV exp { $$ = $1 / $3; } %left TIMES DIV %start prog | stm COMMA exp { $$ =$3; } %% | ‘(‘ exp ‘)’ { $$ = $2; }
Historical Perspective • Originally parsers were written w/o tools • yacc, bison, . . . make tools acceptable • But it is still difficult to write compilers in parser actions (top-down and bottom-up) – Natural grammars are ambiguous – No modularity principle – Many useful programming language features prevent code generation while parsing • Use before declaration • gotos
• • Abstract Syntax Intermediate program representation Defines a tree - Preserves program hierarchy Generated by the parser Declared using an (ambiguous) context free grammar (relatively flat) Not meant for parsing • Keywords and punctuation symbols are not stored (Not relevant once the tree exists) • Big programs can be also handled (possibly via virtual memory)
Issues • Concrete vs. Abstract syntax tree • Need to store concrete source position • Abstract syntax can be defined by: – Ambiguous context free grammar – C recursive data type – “Constructor” functions • Debugging routines linearize the tree
Abstract Syntax for Straight-line Program Stm : : = Stm )Compound. Stm) Stm : : = id Exp )Assign. Stm) Stm : : = Exp. List )Print. Stm) Exp : : = id Exp : : =num Exp : : =Exp Binop Exp )Id. Exp) )Num. Exp) )Op. Exp) Exp : : =Stm Exp. List : : =Exp Exp. List )Eseq. Exp) )Pair. Exp. List) Exp. List : : =Exp )Last. Exp. List) Binop : : =+ Binop : : =- )Plus) )Minus) Binop : : =* Binop : : =/ )Times) )Div)
%{ prog : stm { $$ = $1 ; } #include “absyn. h” ; %} stm: stm SEMICOLUMN stm %union {int num; string id; { $$ = A_Compound. Stm($1, $3) ; } A_stm ; | ID ASSIGN exp A_exp ; {$$ = A_Assign. Stm($1, $3); } A_exp. List; | PRINT LPAREN exps RPAREN } {$$ = A_Print. Stm($3); } %token <num> INT ; %token <id> ID exps : exp %token ASSIGN {$$ = A_Exp. List($1, NULL) ; } PRINT | exps COMMA exp LPAREN {$$ = A_Exp. List($1, $3) RPAREN ; %type <num> exp : INT {$$=A_Num. Exp($1); } %left SEMICOLUMN | ID {$$=A_Id. Exp( $1); } COMMA | exp PLUS exp { $$ = A_Op. Exp($1, A_Plus, $3); } %left PLUS MINUS | exp MINUS exp { $$= A_Op. Exp($1, A_Minus, $3); } %left TIMES DIV | exp TIMES exp { $$ = A_Op. Exp($1, A_Time, $3); } %start prog | exp DIV exp { $$ = A_Op. Exp($1, A_Div, $3); } %% | exp COMMA exp { $$ =A_Eseq. Exp($1, $3); } | ‘(‘ exp ‘)’ { $$ = $2; }
Summary • Flex and Bison simplify the task of writing compiler/interpreter front-ends • Abstract syntax provides a clear interface with other compiler phases – Supports general programming languages • But the design of an abstract syntax for a given PL may take some time
- Slides: 20