Abstract Syntax Mooly Sagiv Schrierber 317 03 640
Abstract Syntax Mooly Sagiv Schrierber 317 03 -640 -7606 Wed 10: 00 -12: 00 html: //www. math. tau. ac. il/~msagiv/courses/wcc. html
Outline • The general idea • Bison • Motivating example Interpreter for arithmetic expressions • The need for abstract syntax • Abstract syntax for Straight-line code • Abstract syntax for Tiger (Targil)
Semantic Analysis during Recursive Descent Parsing • Scanner returns “semantic values” for some tokens • The function of every non-terminal returns the “corresponding subtree value” • When A : : = B C D is applied the function for A can use the values returned by B, C, and D – The function can also pass parameters, e. g. , to D(), reflecting left contexts
int E() { swith(tok) { case num : temp=tok. val; eat(num); return EP(temp); E : : = num E’ E’ : : =empty-string E’ : = + num E’ default: error(…); }} int EP(int left) { swith(tok) { case $: return left; case + : eat(+); temp=tok. val; eat(num); return EP(left + temp); default: error(…) ; }}
Semantic Analysis during Bottom-Up Parsing • Scanner returns “semantic values” for some tokens • Use parser stack to store the “corresponding subtree values” • When A : : = B C D is reduced the function for A can use the values returned by B, C, and D • No action in the middle of the rule
Example 5 E : : = E + num E: : = num + 7 12 E E
Bison Specification Declarations %% Productions %% C -Routines
Interpreter (in Bison) }%declarations of yylex () and yyeror () {% %union} int num ; string id{; %token <id> ID %token <num> NUM %type <num> e f t %start e %% e : e ‘+’ t {$$ = $1 + $3 ; } | e ‘-’ t { $$ = $1 - $3 ; } |t { $$ = $1; } ; t : t ‘*’ f { $$ = $1 * $3; } | t ‘/’ f { $$= $1 / $3; } | f { $$ = $1; } ; f : NUM { $$ = $1; } | ID { $$ = lookup($1); } | ‘-’ e { $$ = - $2; } | ‘(‘ e ‘)’ { $$ = $2; } ;
Interpreter (compact spec. ) }%declarations of yylex () and yyeror() {% %union} e : e PLUS e {$$ = $1 + $3 ; } int num ; | e MINUS e { $$ = $1 - $3 ; } string id{; | e MUL e { $$ = $1 * $3; } %token <id> ID | e DIV e { $$= $1 / $3; } %token <num> NUM | NUM %type <num> e | ID { $$ = lookup($1); } %start e | MINUS e % prec UMINUS { $$ = - $2; } %left PLUS MINUS | ‘(‘ e ‘)’ { $$ = $2; } %left MUL DIV ; %right UMINUS %%
stack input $ 7+11+17$ num 7 11+17$+ action shift reduce e : : = num $ e 7 11+17$+ shift 11+17$ shift $ + e 7 $ num 11 + e 7 $ 17$+ reduce e: : = num
stack e 11 input 17$+ action reduce e : =: e+e + e 7 $ e 18 17$+ shift 17$ shift $ + e 18 $ num 17 + e 18 $ $ reduce e: : = num
stack e 17 input $ action reduce e: : = e+e + e 18 $ e 35 $ $ accept
So why can’t we write all the compiler code in Bison?
}% prog : stm typdef struct table *Table; _ typedef Table_ struct {string id, int value, Table _tail{ ; Table_ Table(string id, int value, struct table *tail; ( stm: stm SEMICOLUMN stm Table_ table=NULL; | ID ASSIGN exp int lookup(Table_ table, string id} ( {update(&table, $1, $3); } assert(table!=NULL( | PRINT LPAREN exps RPAREN if (id==table. id) return table. value; {printf(“n”); } else return lookup(table. tail, id( ; { exps : exp void update(Table_ *tabptr, string id, int value} ( {printf(“%d”, $1) ; } * tabptr = Table(id, value, *tabptr; ( { | exps COMMA exp {% {printf(“%d”, $3) %union {int num; string id{; ; %token <num> INT exp : INT {$$=$1; } %token <id> ID | ID {$$=lookup(table, $1); } %token ASSIGN PRINT LPAREN RPAREN | exp PLUS exp { $$ = $1 + $3; } %type <num> exp | exp MINUS exp { $$= $1 - $3; } %left SEMICOLUMN COMMA | exp TIMES exp { $$ = $1 * $3; } %left PLUS MINUS | exp DIV exp { $$ = $1 / $3; } %left TIMES DIV %start prog | stm COMMA exp { $$ =$3; } %% | ‘(‘ exp ‘)’ { $$ = $2; }
Historical Perspective • Originally parsers were written w/o tools • yacc, bison, . . . make tools acceptable • But it is still difficult to write compilers in parser actions (top-down and bottom-up) – Natural grammars are ambiguous – No modularity principle – Many useful programming language features prevent code generation while parsing • Use before declaration • gotos
• • Abstract Syntax Intermediate program representation Defines a tree - Preserves program hierarchy Generated by the parser Declared using an (ambiguous) context free grammar (relatively flat) Not meant for parsing • Keywords and punctuation symbols are not stored (Not relevant once the tree exists) • Big programs can be also handled (possibly via virtual memory)
Abstract Syntax for Straight-line Program Stm : : = Stm )Compound. Stm( Stm : : = id Exp )Assign. Stm( Stm : : = Exp. List )Print. Stm( Exp : : = id Exp : : =num Exp : : =Exp Binop Exp )Id. Exp( )Num. Exp( )Op. Exp( Exp : : =Stm Exp. List : : =Exp Exp. List )Eseq. Exp( )Pair. Exp. List( Exp. List : : =Exp )Last. Exp. List( Binop : : =+ Binop : : =- )Plus( )Minus( Binop : : =* Binop : : =/ )Times( )Div(
}% prog : stm { $$ = $1 ; } #include “absyn. h” ; {% stm: stm SEMICOLUMN stm %union {int num; string id ; { $$ = A_Compound. Stm($1, $3) ; } A_stm stm; | ID ASSIGN exp A_exp exp; {$$ = A_Assign. Stm($1, $3); } A_exp. List | PRINT LPAREN exps RPAREN {; {$$ = A_Print. Stm($3); } %token <num> INT ; %token <id> ID exps : exp %token ASSIGN {$$ = A_Exp. List($1, NULL) ; } PRINT | exps COMMA exp LPAREN {$$ = A_Exp. List($1, $3) RPAREN ; %type <num> exp : INT {$$=A_Num. Exp($1); } %left SEMICOLUMN | ID {$$=A_Id. Exp( $1); } COMMA | exp PLUS exp { $$ = A_Op. Exp($1, A_Plus, $3); } %left PLUS MINUS | exp MINUS exp { $$= A_Op. Exp($1, A_Minus, $3); } %left TIMES DIV | exp TIMES exp { $$ = A_Op. Exp($1, A_Time, $3); } %start prog | exp DIV exp { $$ = A_Op. Exp($1, A_Div, $3); } %% | exp COMMA exp { $$ =A_Eseq. Exp($1, $3); } | ‘(‘ exp ‘)’ { $$ = $2; }
Summary • Flex and Bison simplify the task of writing compiler/interpreter front-ends • Abstract syntax provides a clear interface with other compiler phases – Supports general programming languages • But the design of an abstract syntax for a given PL may take some time
- Slides: 19