Abstract Syntax Leonidas Fegaras CSE 53174305 L 5

Abstract Syntax Leonidas Fegaras CSE 5317/4305 L 5: Abstract Syntax 1

Abstract Syntax Tree (AST) • A parser typically generates an Abstract Syntax Tree (AST): source file get token get next character scanner AST parser token • A parse tree is not an AST E T F + E T E F T F id(x) + id(y) * id(z) CSE 5317/4305 L 5: Abstract Syntax x * y z 2

Building Abstract Syntax Trees in Java abstract class Exp { } class Integer. Exp extends Exp { public int value; public Integer. Exp ( int n ) { value=n; } } class True. Exp extends Exp { public True. Exp () {} } class False. Exp extends Exp { public False. Exp () {} } class Variable. Exp extends Exp { public String value; public Variable. Exp ( String n ) { value=n; } } CSE 5317/4305 L 5: Abstract Syntax 3

Exp (cont. ) class Binary. Exp extends Exp { public String operator; public Exp left; public Exp right; public Binary. Exp ( String o, Exp l, Exp r ) { operator=o; left=l; right=r; } } class Unary. Exp extends Exp { public String operator; public Exp operand; public Unary. Exp ( String o, Exp e ) { operator=o; operand=e; } } CSE 5317/4305 L 5: Abstract Syntax 4

Exp (cont. ) class Call. Exp extends Exp { public String name; public List<Exp> arguments; public Call. Exp ( String nm, List<Exp> s ){ name=nm; arguments=s; } } class Projection. Exp extends Exp { public Exp value; public String attribute; public Projection. Exp ( Exp v, String a ) { value=v; attribute=a; } } CSE 5317/4305 L 5: Abstract Syntax 5

Exp (cont. ) class Record. Element { public String attribute; public Exp value; public Record. Element ( String a, Exp v ) { attribute=a; value=v; } } class Record. Exp extends Exp { public List<Record. Element> elements; public Record. Exp ( List<Record. Element> el ) { elements=el; } } … or better: class Record. Exp extends Exp { public Map<String, Exp> elements; public Record. Exp ( Map<String, Exp> el ) { elements=el; } } CSE 5317/4305 L 5: Abstract Syntax 6

Examples • The AST for the input (x-2)+3 new Binary. Exp("+", new Binary. Exp("-", new Variable. Exp("x"), new Integer. Exp(2)), new Integer. Exp(3)) • The AST for the input f(x. A, true) new Call. Exp("f", Arrays. as. List(new Projection. Exp(new Variable. Exp("x"), "A"), new True. Exp())) CSE 5317/4305 L 5: Abstract Syntax 7

Building ASTs in Scala Use case classes: sealed abstract class Exp case class True. Exp () extends Exp case class False. Exp () extends Exp case class Integer. Exp ( value: Int ) extends Exp case class String. Exp ( value: String ) extends Exp case class Variable. Exp ( name: String ) extends Exp case class Binary. Exp ( operator: String, left: Exp, right: Exp ) extends Exp case class Unary. Exp ( operator: String, operand: Exp ) extends Exp case class Call. Exp ( name: String, arguments: List[Exp] ) extends Exp case class Projection. Exp ( record: Exp, attribute: String ) extends Exp case class Record. Exp ( arguments: List[(String, Exp)] ) extends Exp For example, the AST for the input (x-2)+3 Binary. Exp("+", Binary. Exp("-", Variable. Exp("x"), Integer. Exp(2)), Integer. Exp(3)) the AST for the input f(x. A, true) Call. Exp("f", List(Projection. Exp(Variable. Exp("x"), "A"), True. Exp())) CSE 5317/4305 L 5: Abstract Syntax 8

Adding Semantic Actions to a Parser • Right-associative grammar: E : : = T + E | T-E T : : = num • After left factoring: E : : = T E' E' : : = + E | -E T : : = num • Recursive descent parser: CSE 5317/4305 L 5: Abstract Syntax int E () { int left = T(); if (current_token == '+') { read_next_token(); return left + E(); } else if (current_token == '-') { read_next_token(); return left - E(); } else error(); }; int T () { if (current_token=='num') { int n = num_value; read_next_token(); return n; } else error(); }; 9

Adding Semantic Actions to a Parser • Left-associative grammar: E : : = E + T |E-T T : : = num • After left recursion elimination: E : : = T E' E' : : = + T E' | - T E' | T : : = num • Recursive descent parser: CSE 5317/4305 L 5: Abstract Syntax int E () { return Eprime(T()); }; int Eprime ( int left ) { if (current_token=='+') { read_next_token(); return Eprime(left + T()); } else if (current_token=='-') { read_next_token(); return Eprime(left - T()); } else return left; }; int T () { if (current_token=='num') { int n = num_value; read_next_token(); return n; } else error(); }; 10

Table-Driven Predictive Parsers • Use the parse stack to push/pop both actions and symbols but they use a separate semantic stack to execute the actions push(S); read_next_token(); repeat X = pop(); if (X is a terminal or '$') if (X == current_token) read_next_token(); else error(); else if (X is an action) perform the action; else if (M[X, current_token] == "X : : = Y 1 Y 2. . . Yk") { push(Yk); . . . push(Y 1); } else error(); until X == '$'; CSE 5317/4305 L 5: Abstract Syntax 11

Example • Need to embed actions { code; } in the grammar rules • Suppose that push. V and pop. V are the functions to manipulate the semantic stack • The following is the grammar of an interpreter that uses the semantic stack to perform additions and subtractions: E : : = T E' $ { print(pop. V()); } E' : : = + T { push. V(pop. V() + pop. V()); } E' | - T { push. V(-pop. V() + pop. V()); } E' | T : : = num { push. V(num); } • For example, for 1+5 -2, we have the following sequence of actions: push. V(1); push. V(5); push. V(pop. V()+pop. V()); push. V(2); push. V(-pop. V()+pop. V()); print(pop. V()); CSE 5317/4305 L 5: Abstract Syntax 12

Bottom-Up Parsers • can only perform an action after a reduction • We can only have rules of the form X : : = Y 1. . . Yn { action } where the action is always at the end of the rule; this action is evaluated after the rule X : : = Y 1. . . Yn is reduced • How? In addition to state numbers, the parser pushes values into the parse stack • If we want to put an action in the middle of the right-hand-side of a rule, we use a dummy non-terminal, called a marker For example, X : : = a { action } b is equivalent to X : : = M b M : : = a { action } CSE 5317/4305 L 5: Abstract Syntax 13

CUP • Both terminals and non-terminals are associated with typed values – these values are instances of the Object class (or of some subclass of the Object class) – the value associated with a terminal is in most cases an Object, except for an identifier which is a String, for an integer which is an Integer, etc – the typical values associated with non-terminals in a compiler are ASTs, lists of ASTs, etc • You can retrieve the value of a symbol s at the right-hand-side of a rule by using the notation s: x, where x is a variable name that hasn't appeared elsewhere in this rule • The value of the non-terminal defined by a rule is called RESULT and should always be assigned a value in the action – eg if the non-terminal E is associated with an Integer object, then E : : = E: n PLUS E: m CSE 5317/4305 L 5: Abstract Syntax {: RESULT = n+m; : } 14

Machinery • The parse stack elements are of type struct( state: int, value: Object ) – int is the state number – Object is the value • When a reduction occurs, the RESULT value is calculated from the values in the stack and is pushed along with the GOTO state • Example: after the reduction by E : : = E: n PLUS E: m {: RESULT = n+m; : } the RESULT value is stack[top-2]. value + stack[top]. value which is the new value pushed in the stack along with the GOTO state CSE 5317/4305 L 5: Abstract Syntax 15

ASTs in CUP (calc. cup) • Need to associate each non-terminal symbol with an AST type • Using Scala case classes in Java (!) non terminal Expr exp; non terminal List expl; exp : : = exp: e 1 PLUS exp: e 2 {: RESULT = new Bin. Op. Exp(“+”, e 1, e 2); : } | exp: e 1 MINUS exp: e 2 | id: nm LP expl: el RP | INT: n {: RESULT = new Bin. Op. Exp(“-”, e 1, e 2); : } {: RESULT = new Call. Exp(nm, el); : } {: RESULT = new Int. Const(n); : } ; expl : : = expl: el COMMA exp: e {: RESULT = append(e, el); : } | exp: e {: RESULT = cons(e, nil); : } ; CSE 5317/4305 L 5: Abstract Syntax 16