Languages and Compilers SProg og Oversttere Bent Thomsen
Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University
Programming Language Specification • Why? – A communication device between people who need to have a common understanding of the PL: • language designer, language implementor, user • What to specify? – Specify what is a ‘well formed’ program • syntax • contextual constraints (also called static semantics): – scoping rules – type rules – Specify what is the meaning of (well formed) programs • semantics (also called runtime semantics)
Programming Language Specification • Why? • What to specify? • How to specify ? – Formal specification: use some kind of precisely defined formalism – Informal specification: description in English. – Usually a mix of both (e. g. Java specification) • Syntax => formal specification using CFG • Contextual constraints and semantics => informal
Programming Language specification – A Language specification has (at least) three parts: • Syntax of the language: usually formal: EBNF • Contextual constraints: – scope rules (often written in English, but can be formal) – type rules (formal or informal) • Semantics: – defined by the implementation – informal descriptions in English – formal using operational or denotational semantics The Syntax and Semantics course will teach you how to read and write a formal language specification – so pay attention!
Important! • Syntax is the visible part of a programming language – Programming Language designers can waste a lot of time discussing unimportant details of syntax • The language paradigm is the next most visible part – The choice of paradigm, and therefore language, depends on how humans best think about the problem – There are no right models of computations – just different models of computations, some more suited for certain classes of problems than others • The most invisible part is the language semantics – Clear semantics usually leads to simple and efficient implementations
Syntax Specification Syntax is specified using “Context Free Grammars”: – – A finite set of terminal symbols A finite set of non-terminal symbols A start symbol A finite set of production rules Usually CFG are written in “Bachus Naur Form” or BNF notation. A production rule in BNF notation is written as: N : : = a where N is a non terminal and a a sequence of terminals and non-terminals N : : = a | b |. . . is an abbreviation for several rules with N as left-hand side.
Syntax Specification A CFG defines a set of strings. This is called the language of the CFG. Example: Start : : = Letter | Start Digit Letter : : = a | b | c | d |. . . | z Digit : : = 0 | 1 | 2 |. . . | 9 Q: What is the “language” defined by this grammar?
Example: Syntax of “Mini Triangle” Mini triangle is a very simple Pascal-like programming language. An example program: !This is a comment. let const m ~ 7; var n in begin n : = 2 * m putint(n) end Declarations Expression Command ;
Example: Syntax of “Mini Triangle” Program : : = single-Command : : = V-name : = Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end Command : : = single-Command | Command ; single-Command. . .
Example: Syntax of “Mini Triangle” (continued) Expression : : = primary-Expression | Expression Operator primary-Expression : : = Integer-Literal | V-name | Operator primary-Expression | ( Expression ) V-name : : = Identifier : : = Letter | Identifier Digit Integer-Literal : : = Digit | Integer-Literal Digit Operator : : = + | - | * | / | < | > | =
Example: Syntax of “Mini Triangle” (continued) Declaration : : = single-Declaration | Declaration ; single-Declaration : : = const Identifier ~ Expression | var Identifier : Type-denoter : : = Identifier Comment : : = ! Comment. Line eol Comment. Line : : = Graphic Comment. Line Graphic : : = any printable character or space
Syntax Trees A syntax tree is an ordered labeled tree such that: a) terminal nodes (leaf nodes) are labeled by terminal symbols b) non-terminal nodes (internal nodes) are labeled by non terminal symbols. c) each non-terminal node labeled by N has children X 1, X 2, . . . Xn (in this order) such that N : = X 1, X 2, . . . Xn is a production.
Syntax Trees Example: 1 2 3 Expression : = Expression Op primary-Exp Expression 1 Expression 3 primary-Exp. V-name Ident d primary-Exp. V-name 2 Op Int-Lit Op + 10 * Ident d
Concrete and Abstract Syntax The previous grammar specified the concrete syntax of mini triangle. The concrete syntax is important for the programmer who needs to know exactly how to write syntactically wellformed programs. The abstract syntax omits irrelevant syntactic details and only specifies the essential structure of programs. Example: different concrete syntaxes for an assignment v : = e (set! v e) e -> v v = e
Example: Concrete Syntax of Expressions (recap) Expression : : = primary-Expression | Expression Operator primary-Expression : : = Integer-Literal | V-name | Operator primary-Expression | ( Expression ) V-name : : = Identifier
Example: Abstract Syntax of Expressions Expression : : = Integer-Literal Integer. Exp | V-name Vname. Exp | Operator Expression Unary. Exp | Expression Op Expression Binary. Exp V-name: : = Identifier Simple. VName
Abstract Syntax Trees Abstract Syntax Tree for: d: =d+10*n Assignment. Cmd Binary. Expression VName. Exp Simple. VName Ident d Integer. Exp VName. Exp Simple. VName Op Int-Lit + 10 Op * Ident n
Contextual Constraints Syntax rules alone are not enough to specify the format of well-formed programs. Example 1: let const m~2 in m + x Undefined! Example 2: let const m~2 ; var n: Boolean in begin n : = m<4; n : = n+1 Type error! end Scope Rules Type Rules
Scope Rules Scope rules regulate visibility of identifiers. They relate every applied occurrence of an identifier to a binding occurrence ? Example 1 Binding occurence Example 2: let const m~2; let const m~2 var r: Integer in m + x in r : = 10*m Applied occurence Terminology: Static binding vs. dynamic binding
Type Rules Type rules regulate the expected types of arguments and types of returned values for the operations of a language. Examples Type rule of < : E 1 < E 2 is type correct and of type Boolean if E 1 and E 2 are type correct and of type Integer Type rule of while: while E do C is type correct if E of type Boolean and C type correct Terminology: Static typing vs. dynamic typing
Semantics Specification of semantics is concerned with specifying the “meaning” of well-formed programs. Terminology: Expressions are evaluated and yield values (and may or may not perform side effects) Commands are executed and perform side effects. Declarations are elaborated to produce bindings Side effects: • change the values of variables • perform input/output
Semantics Example: The (informally specified) semantics of commands in mini Triangle. Commands are executed to update variables and/or perform input output. The assignment command V : = E is executed as follows: first the expression E is evaluated to yield a value v then v is assigned to the variable named V The sequential command C 1; C 2 is executed as follows: first the command C 1 is executed then the command C 2 is executed etc.
Semantics Example: The semantics of expressions. An expression is evaluated to yield a value. An (integer literal expression) IL yields the integer value of IL The (variable or constant name) expression V yields the value of the variable or constant named V The (binary operation) expression E 1 O E 2 yields the value obtained by applying the binary operation O to the values yielded by (the evaluation of) expressions E 1 and E 2 etc.
Semantics Example: The semantics of declarations. A declaration is elaborated to produce bindings. It may also have the side effect of allocating (memory for) variables. The constant declaration const I~E is elaborated by binding the identifier value I to the value yielded by E The constant declaration var I: T is elaborated by binding I to a newly allocated variable, whose initial value is undefined. The variable will be deallocated on exit from the let containing the declaration. The sequential declaration D 1; D 2 is elaborated by elaborating D 1 followed by D 2 combining the bindings produced by both. D 2 is elaborated in the environment of the sequential declaration overlaid by the bindings produced by D 1
Language Processors: Why do we need them? Programmer Compute surface area of a triangle? Programmer Concepts and Ideas Java Program How to bridge the “semantic gap” ? JVM Assembly code JVM Binary code JVM Interpreter 0101001001. . . Hardware X 86 Processor Hardware
Language Processors: What are they? A programming language processor is any system (software or hardware) that manipulates programs. Examples: – Editors – Translators (e. g. compiler, assembler, disassembler) – Interpreters
Programming Language Processing • Any system for processing programming languages, executing them, or preparing them for execution is called a language processor. • Language processors include translators and auxiliary tools like syntax-directed editors. • A translator that immediately executes a program is called an interpreter, while a translator that changes a program into a form suitable for execution is called a compiler. • In other words, interpretation is a one-step process, in which both the program and the input are provided to the interpreter and the output is the result of the interpretation as shown below.
Interpreter
You use lots of interpreters everyday! Several languages are used to add dynamics and animation to HTML. Many programming languages are executed (possibly simultaneously) in the browser! Browser Control / HTML VBScript Interpreter (compiler) script Java Virtual Machine (JVM) HTML Interpreter (display formatting) applet script Communications facilities HTML page Control / HTML Java. Script Interpreter
And also across the web Web-Client HTML-Form (+Java. Script) Reply Web-Server Call PHP interpreter Submit Data Web-Browser WWW Response Database Server PHP Script Response LAN DBMS SQL commands Database Output
Compilation • Compilation is at least two-step process, in which the original program (source program) is input to the compiler, and a new program (target program) is output from the compiler. The compilation steps can be visualized as the following.
Compiler (simple view)
Compiler
Hybrid compiler / interpreter
Finally Keep in mind, the compiler is the program from which all other programs arise. If your compiler is under par, all programs created by the compiler will also be under par. No matter the purpose or use -- your own enlightenment about compilers or commercial applications -- you want to be patient and do a good job with this program; in other words, don't try to throw this together on a weekend. Asking a computer programmer to tell you how to write a compiler is like saying to Picasso, "Teach me to paint like you. " *Sigh* Nevertheless, Picasso shall try.
- Slides: 35