Levels of Programming Languages Highlevel program class Triangle
































![Code Generation with Code Templates While command execute [while E do C] = JUMP Code Generation with Code Templates While command execute [while E do C] = JUMP](https://slidetodoc.com/presentation_image_h2/9cffd551bde258d207eaf21074073d98/image-33.jpg)


- Slides: 35

Levels of Programming Languages High-level program class Triangle {. . . float surface() return b*h/2; } Low-level program LOAD r 1, b LOAD r 2, h MUL r 1, r 2 DIV r 1, #2 RET Executable Machine code 000100100101 0010010011101100 10101101001. . . Review 1

Compilers and other translators Examples: Chinese => English Java => JVM byte codes Scheme => C C => Scheme x 86 Assembly Language => x 86 binary codes Other non-traditional examples: disassembler, decompiler (e. g. JVM => Java) Review 2

Tombstone Diagrams What are they? – diagrams consisting out of a set of “puzzle pieces” we can use to reason about language processors and programs – different kinds of pieces – combination rules (not all diagrams are “well formed”) Program P implemented in L P L Machine implemented in hardware M Review Translator implemented in L S -> T L Language interpreter in L M L 3

Syntax Specification Syntax is specified using “Context Free Grammars”: – – A finite set of terminal symbols A finite set of non-terminal symbols A start symbol A finite set of production rules Usually CFG are written in “Bachus Naur Form” or BNF notation. A production rule in BNF notation is written as: N : : = a where N is a non terminal and a a sequence of terminals and non-terminals N : : = a | b |. . . is an abbreviation for several rules with N as left-hand side. Review 4

Concrete and Abstract Syntax The previous grammar specified the concrete syntax of mini triangle. The concrete syntax is important for the programmer who needs to know exactly how to write syntactically wellformed programs. The abstract syntax omits irrelevant syntactic details and only specifies the essential structure of programs. Example: different concrete syntaxes for an assignment v : = e (set! v e) e -> v v = e Review 5

Abstract Syntax Trees Abstract Syntax Tree for: d: =d+10*n Assignment. Cmd Binary. Expression VName. Exp Simple. VName Ident d Review Ident d Integer. Exp VName. Exp Simple. VName Op Int-Lit + 10 Op * Ident n 6

Contextual Constraints Syntax rules alone are not enough to specify the format of well-formed programs. Example 1: let const m~2 in m + x Undefined! Example 2: let const m~2 ; var n: Boolean in begin n : = m<4; n : = n+1 Type error! end Review Scope Rules Type Rules 7

Semantics Specification of semantics is concerned with specifying the “meaning” of well-formed programs. Terminology: Expressions are evaluated and yield values (and may or may not perform side effects) Commands are executed and perform side effects. Declarations are elaborated to produce bindings Side effects: • change the values of variables • perform input/output Review 8

Phases of a Compiler A compiler’s phases are steps in transforming source code into object code. The different phases correspond roughly to the different parts of the language specification: • Syntax analysis <-> Syntax • Contextual analysis <-> Contextual constraints • Code generation <-> Semantics Review 9

Compiler Passes • A pass is a complete traversal of the source program, or a complete traversal of some internal representation of the source program. • A pass can correspond to a “phase” but it does not have to! • Sometimes a single “pass” corresponds to several phases that are interleaved in time. • What and how many passes a compiler does over the source program is an important design decision. Review 10

Syntax Analysis Dataflow chart Source Program Stream of Characters Scanner Error Reports Stream of “Tokens” Parser Error Reports Abstract Syntax Tree Review 11

Regular Expressions • RE are a notation for expressing a set of strings of terminal symbols. Different kinds of RE: e The empty string t Generates only the string t XY Generates any string xy such that x is generated by x and y is generated by Y X|Y Generates any string which generated either by X or by Y X* The concatenation of zero or more strings generated by X (X) For grouping, Review 12

FA and the implementation of Scanners • Regular expressions, (N)DFA-e and NDFA and DFA’s are all equivalent formalism in terms of what languages can be defined with them. • Regular expressions are a convenient notation for describing the “tokens” of programming languages. • Regular expressions can be converted into FA’s (the algorithm for conversion into NDFA-e is straightforward) • DFA’s can be easily implemented as computer programs. Review 13

JFlex Lexical Analyzer Generator for Java Definition of tokens Regular Expressions JFlex Java File: Scanner Class Recognizes Tokens Review 14

Parsing == Recognition + determining phrase structure (for example by generating AST) – Different types of parsing strategies • bottom up • top down – Recursive descent parsing • What is it • How to implement one given an EBNF specification – Bottom up parsing algorithms Review 15

Top-down parsing Sentence Subject Verb Object Noun The Review cat . Noun sees a rat . 16

Bottom up parsing Sentence Subject The Review Object Noun Verb cat sees Noun a rat . 17

Development of Recursive Descent Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with – private variable current. Token – methods to call the scanner: accept and accept. It (4) Implement private parsing methods: – add private parse. N method for each non terminal N – public parse method that • gets the first token form the scanner • calls parse. S (S is the start symbol of the grammar) Review 18

LL 1 Grammars • The presented algorithm to convert EBNF into a parser does not work for all possible grammars. • It only works for so called “LL 1” grammars. • Basically, an LL 1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. • What grammars are LL 1? How can we recognize that a grammar is (or is not) LL 1? => We can deduce the necessary conditions from the parser generation algorithm. Review 19

LR parsing – – The algorithm makes use of a stack. The first item on the stack is the initial state of a DFA A state of the automaton is a set of LR 0/LR 1 items. The initial state is constructed from productions of the form S: = • a [, $] (where S is the start symbol of the CFG) – The stack contains (in alternating) order: • A DFA state • A terminal symbol or part (subtree) of the parse tree being constructed – The items on the stack are related by transitions of the DFA – There are two basic actions in the algorithm: • shift: get next input token • reduce: build a new node (remove children from stack) Review 20

Bottom Up Parsers: Overview of Algorithms • LR 0 : The simplest algorithm, theoretically important but rather weak (not practical) • SLR : An improved version of LR 0 more practical but still rather weak. • LR 1 : LR 0 algorithm with extra lookahead token. – very powerful algorithm. Not often used because of large memory requirements (very big parsing tables) • LALR : “Watered down” version of LR 1 – still very powerful, but has much smaller parsing tables – most commonly used algorithm today Review 21

Java. CUP: A LALR generator for Java Definition of tokens Grammar BNF-like Specification Regular Expressions JFlex Java. CUP Java File: Scanner Class Java File: Parser Class Recognizes Tokens Uses Scanner to get Tokens Parses Stream of Tokens Syntactic Analyzer Review 22

Contextual Analysis -> Decorated AST Annotations: result of identification : type result of type checking Program Let. Command Sequential. Declaration Assign. Command Binary. Expr : int Assign. Command Var. Decl Ident n Integer Review VName. Exp Int. Expr : char : int Simple. T Simple. V : char : int Simple. T Ident Char. Expr Ident c Char Ident Char. Lit Ident c ‘&’ n : int Ident Op Int. Lit n + 1 23

Nested Block Structure Nested A language exhibits nested block structure if blocks may be nested one within another (typically with no upper bound on the level of nesting that is allowed). There can be any number of scope levels (depending on the level of nesting of blocks): Typical scope rules: • no identifier may be declared more than once within the same block (at the same level). • for any applied occurrence there must be a corresponding declaration, either within the same block or in a block in which it is nested. Review 24

Type Checking For most statically typed programming languages, a bottom up algorithm over the AST: • Types of expression AST leaves are known immediately: – literals => obvious – variables => from the ID table – named constants => from the ID table • Types of internal nodes are inferred from the type of the children and the type rule for that kind of expression Review 25

Runtime organization • Data Representation: how to represent values of the source language on the target machine. • Primitives, arrays, structures, unions, pointers • Expression Evaluation: How to organize computing the values of expressions (taking care of intermediate results) • Register vs. stack machine • Storage Allocation: How to organize storage for variables (considering different lifetimes of global, local and heap variables) • Activation records, static links • Routines: How to implement procedures, functions (and how to pass their parameters and return values) • Value vs. reference, closures, recursion • Object Orientation: Runtime organization for OO languages • Method tables Review 26

identity n: 23 check p n: 15 check p Review check i: 88 check p n: 88 check i: 88 p n: 7 Tricky sort check i: 88 check identity 27

JVM External representation platform independent. class files load JVM internal representation implementation dependent classes objects primitive types integers arrays methods The JVM is an abstract machine in the true sense of the word. The JVM spec. does not specify implementation details (can be dependent on target OS/platform, performance requirements etc. ) The JVM spec defines a machine independent “class file format” that all JVM implementations must support. Review 28

Inspecting JVM code % javac Factorial. java % javap -c -verbose Factorial Compiled from Factorial. java public class Factorial extends java. lang. Object { public Factorial(); /* Stack=1, Locals=1, Args_size=1 */ public int fac(int); /* Stack=2, Locals=4, Args_size=2 */ } Method Factorial() 0 aload_0 1 invokespecial #1 <Method java. lang. Object()> 4 return Review 29

Inspecting JVM Code. . . // address: Method int fac(int) // stack: 0 iconst_1 // stack: 1 istore_2 // stack: 2 iconst_2 // stack: 3 istore_3 // stack: 4 goto 14 7 iload_2 // stack: 8 iload_3 // stack: 9 imul // stack: 10 istore_2 11 iinc 3 1 14 iload_3 // stack: 15 iload_1 // stack: 16 if_icmple 7 // stack: 19 iload_2 // stack: 20 ireturn Review 0 this this 1 n n n 2 result result 3 i i 1 i i 2 i this n result i this n result i this n n result i i n i i result 30

Code Generation Source Program let var n: integer; var c: char in begin c : = ‘&’; n : = n+1 end ~ Source and target program must be “semantically equivalent” Target program PUSH 2 LOADL 38 STORE 1[SB] LOAD 0 LOADL 1 CALL add STORE 0[SB] POP 2 HALT Semantic specification of the source language is structured in terms of phrases in the SL: expressions, commands, etc. => Code generation follows the same “inductive” structure. Review 31

Specifying Code Generation with Code Templates The code generation functions for Mini Triangle Phrase Class Function Effect of the generated code Program run P Run program P then halt. Starting and finishing with empty stack Command execute C Execute Command C. May update variables but does not shrink or grow the stack! Expres- evaluate E Evaluate E, net result is pushing the value of sion E on the stack. V-name Push value of constant or variable on the fetch V stack. V-name assign V Pop value from stack and store in variable V Declaelaborate Elaborate declaration, make space on the ration stack for constants and variables in the decl. D Review 32
![Code Generation with Code Templates While command execute while E do C JUMP Code Generation with Code Templates While command execute [while E do C] = JUMP](https://slidetodoc.com/presentation_image_h2/9cffd551bde258d207eaf21074073d98/image-33.jpg)
Code Generation with Code Templates While command execute [while E do C] = JUMP h g: execute [C] h: evaluate[E] JUMPIF(1) g C E Review 33

Code improvement (optimization) The code generated by our compiler is not efficient: • It computes values at runtime that could be known at compile time • It computes values more times than necessary We can do better! • Constant folding • Common sub-expression elimination • Code motion • Dead code elimination Review 34

Optimization implementation • Is the optimization correct or safe? • Is the optimization an improvement? • What sort of analyses do we need to perform to get the required information? –Local –Global Review 35