Programming Languages and Compilers CS 421 Elsa L

  • Slides: 70
Download presentation
Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http: //courses.

Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http: //courses. engr. illinois. edu/cs 421 Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha 3/5/2021 1

Programming Languages & Compilers Three Main Topics of the Course I III New Programming

Programming Languages & Compilers Three Main Topics of the Course I III New Programming Paradigm Language Semantics 3/5/2021 2

Programming Languages & Compilers II : Language Translation Type Systems 3/5/2021 Lexing and Parsing

Programming Languages & Compilers II : Language Translation Type Systems 3/5/2021 Lexing and Parsing Interpretation 3

Major Phases of a Compiler Optimized IR Source Program Relocatable Lex Instruction Object Code

Major Phases of a Compiler Optimized IR Source Program Relocatable Lex Instruction Object Code Selection Tokens Linker Parse Unoptimized Machine. Abstract Syntax Specific Assembly Language Machine Semantic Code Optimize Analysis Optimized Machine-Specific Symbol Table Assembly Language Translate Emit code Intermediate Assembly Language Representation Assembler Modified from “Modern Compiler Implementation in ML”, by Andrew Appel

Where We Are Going Next? n n We want to turn strings (code) into

Where We Are Going Next? n n We want to turn strings (code) into computer instructions Done in phases Turn strings into abstract syntax trees (parse) Translate abstract syntax trees into executable instructions (interpret or compile) 3/5/2021 5

Meta-discourse n n n Language Syntax and Semantics Syntax - Regular Expressions, DFSAs and

Meta-discourse n n n Language Syntax and Semantics Syntax - Regular Expressions, DFSAs and NDFSAs - Grammars Semantics - Natural Semantics - Transition Semantics 3/5/2021 6

Language Syntax n n n Syntax is the description of which strings of symbols

Language Syntax n n n Syntax is the description of which strings of symbols are meaningful expressions in a language It takes more than syntax to understand a language; need meaning (semantics) too Syntax is the entry point 3/5/2021 7

Syntax of English Language n Pattern 1 n Pattern 2 3/5/2021 8

Syntax of English Language n Pattern 1 n Pattern 2 3/5/2021 8

Elements of Syntax n n n n Character set – previously always ASCII, now

Elements of Syntax n n n n Character set – previously always ASCII, now often 64 character sets Keywords – usually reserved Special constants – cannot be assigned to Identifiers – can be assigned to Operator symbols Delimiters (parenthesis, braces, brackets) Blanks (aka white space) 3/5/2021 9

Elements of Syntax n Expressions if. . . then begin. . . ; .

Elements of Syntax n Expressions if. . . then begin. . . ; . . . end else begin. . . ; . . . end n Type expressions typexpr 1 -> typexpr 2 n Declarations (in functional languages) let pattern = expr n Statements (in imperative languages) a = b + c n Subprograms let pattern 1 = expr 1 in expr 3/5/2021 10

Elements of Syntax Modules n Interfaces n Classes (for object-oriented languages) n 3/5/2021 11

Elements of Syntax Modules n Interfaces n Classes (for object-oriented languages) n 3/5/2021 11

Lexing and Parsing n Converting strings to abstract syntax trees done in two phases

Lexing and Parsing n Converting strings to abstract syntax trees done in two phases n Lexing: Converting string (or streams of characters) into lists (or streams) of tokens (the “words” of the language) n n Parsing: Convert a list of tokens into an abstract syntax tree n 3/5/2021 Specification Technique: Regular Expressions Specification Technique: BNF Grammars 12

Formal Language Descriptions n n n Regular expressions, regular grammars, finite state automata Context-free

Formal Language Descriptions n n n Regular expressions, regular grammars, finite state automata Context-free grammars, BNF grammars, syntax diagrams Whole family more of grammars and automata – covered in automata theory 3/5/2021 13

Grammars n n Grammars are formal descriptions of which strings over a given character

Grammars n n Grammars are formal descriptions of which strings over a given character set are in a particular language Language designers write grammar Language implementers use grammar to know what programs to accept Language users use grammar to know how to write legitimate programs 3/5/2021 14

Regular Expressions - Review n n Start with a given character set – a,

Regular Expressions - Review n n Start with a given character set – a, b, c… Each character is a regular expression It represents the set of one string containing just that character n L(a) = {a} n 3/5/2021 15

Regular Expressions n If x and y are regular expressions, then xy is a

Regular Expressions n If x and y are regular expressions, then xy is a regular expression n It represents the set of all strings made from first a string described by x then a string described by y If L(x)={a, ab} and L(y)={c, d} then L(xy) ={ac, ad, abc, abd} 3/5/2021 16

Regular Expressions n If x and y are regular expressions, then x y is

Regular Expressions n If x and y are regular expressions, then x y is a regular expression n It represents the set of strings described by either x or y If L(x)={a, ab} and L(y)={c, d} then L(x y)={a, ab, c, d} 3/5/2021 17

Regular Expressions n If x is a regular expression, then so is (x) n

Regular Expressions n If x is a regular expression, then so is (x) n n It represents the same thing as x If x is a regular expression, then so is x* It represents strings made from concatenating zero or more strings from x If L(x) = {a, ab} then L(x*) ={“”, a, ab, aab, abab, …} n n It represents {“”}, set containing the empty string Φ n It represents { }, the empty set 3/5/2021 18

Example Regular Expressions n (0 1)*1 n n a*b(a*) n n The set of

Example Regular Expressions n (0 1)*1 n n a*b(a*) n n The set of all strings of a’s and b’s with exactly one b ((01) (10))* n n The set of all strings of 0’s and 1’s ending in 1, {1, 01, 11, …} You tell me Regular expressions (equivalently, regular grammars) important for lexing, breaking strings into recognized words 3/5/2021 19

Regular Grammars n n n Subclass of BNF (covered in detail sool) Only rules

Regular Grammars n n n Subclass of BNF (covered in detail sool) Only rules of form <nonterminal>: : =<terminal><nonterminal> or <nonterminal>: : =<terminal> or <nonterminal>: : =ε Defines same class of languages as regular expressions Important for writing lexers (programs that convert strings of characters into strings of tokens) Close connection to nondeterministic finite state automata – nonterminals = states; rule = edge ~ ~ 3/5/2021 20

Example n n Regular grammar: <Balanced> : : = 0<One. And. More> <Balanced> :

Example n n Regular grammar: <Balanced> : : = 0<One. And. More> <Balanced> : : = 1<Zero. And. More> <One. And. More> : : = 1<Balanced> <Zero. And. More> : : = 0<Balanced> Generates even length strings where every initial substring of even length has same number of 0’s as 1’s 3/5/2021 21

Example: Lexing n Regular expressions good for describing lexemes (words) in a programming language

Example: Lexing n Regular expressions good for describing lexemes (words) in a programming language n n 3/5/2021 Identifier = (a b … z A B … Z) (a b … z A B … Z 0 1 … 9)* Digit = (0 1 … 9) Number = 0 (1 … 9)(0 … 9)* ~ (1 … 9)(0 … 9)* Keywords: if = if, while = while, … 22

Implementing Regular Expressions Regular expressions reasonable way to generate strings in language n Not

Implementing Regular Expressions Regular expressions reasonable way to generate strings in language n Not so good for recognizing when a string is in language n Problems with Regular Expressions n which option to choose, n how many repetitions to make n Answer: finite state automata n Should have seen in CS 374 n 3/5/2021 23

Lexing Different syntactic categories of “words”: tokens Example: n Convert sequence of characters into

Lexing Different syntactic categories of “words”: tokens Example: n Convert sequence of characters into sequence of strings, integers, and floating point numbers. n "asd 123 jkl 3. 14" will become: [String "asd"; Int 123; String "jkl"; Float 3. 14] n 3/5/2021 24

Lex, ocamllex n n Could write the reg exp, then translate to DFA by

Lex, ocamllex n n Could write the reg exp, then translate to DFA by hand n A lot of work Better: Write program to take reg exp as input and automatically generates automata Lex is such a program ocamllex version for ocaml 3/5/2021 25

How to do it n To use regular expressions to parse our input we

How to do it n To use regular expressions to parse our input we need: Some way to identify the input string — call it a lexing buffer n Set of regular expressions, n Corresponding set of actions to take when they are matched. n 3/5/2021 26

How to do it n n n The lexer will take the regular expressions

How to do it n n n The lexer will take the regular expressions and generate a state machine. The state machine will take our lexing buffer and apply the transitions. . . If we reach an accepting state from which we can go no further, the machine will perform the appropriate action. 3/5/2021 27

Mechanics n n n Put table of reg exp and corresponding actions (written in

Mechanics n n n Put table of reg exp and corresponding actions (written in ocaml) into a file <filename>. mll Call ocamllex <filename>. mll Produces Ocaml code for a lexical analyzer in file <filename>. ml 3/5/2021 28

Sample Input rule main = parse ['0'-'9']+ { print_string "Intn"} | ['0'-'9']+'. '['0'-'9']+ {

Sample Input rule main = parse ['0'-'9']+ { print_string "Intn"} | ['0'-'9']+'. '['0'-'9']+ { print_string "Floatn"} | ['a'-'z']+ { print_string "Stringn"} | _ { main lexbuf } { let newlexbuf = (Lexing. from_channel stdin) in print_string "Ready to lex. n"; main newlexbuf } 3/5/2021 29

General Input { header } let ident = regexp. . . rule entrypoint [arg

General Input { header } let ident = regexp. . . rule entrypoint [arg 1. . . argn] = parse regexp { action } |. . . | regexp { action } and entrypoint [arg 1. . . argn] = parse. . . and. . . { trailer } 3/5/2021 30

Ocamllex Input n n header and trailer contain arbitrary ocaml code put at top

Ocamllex Input n n header and trailer contain arbitrary ocaml code put at top an bottom of <filename>. ml let ident = regexp. . . Introduces ident for use in later regular expressions 3/5/2021 31

Ocamllex Input n <filename>. ml contains one lexing function per entrypoint n n n

Ocamllex Input n <filename>. ml contains one lexing function per entrypoint n n n Name of function is name given for entrypoint Each entry point becomes an Ocaml function that takes n+1 arguments, the extra implicit last argument being of type Lexing. lexbuf arg 1. . . argn are for use in action 3/5/2021 32

Ocamllex Regular Expression Single quoted characters for letters: ‘a’ n _: (underscore) matches any

Ocamllex Regular Expression Single quoted characters for letters: ‘a’ n _: (underscore) matches any letter n Eof: special “end_of_file” marker n Concatenation same as usual n “string”: concatenation of sequence of characters n e 1 | e 2 : choice - what was e 1 e 2 n 3/5/2021 33

Ocamllex Regular Expression [c 1 - c 2]: choice of any character between first

Ocamllex Regular Expression [c 1 - c 2]: choice of any character between first and second inclusive, as determined by character codes n [^c 1 - c 2]: choice of any character NOT in set n e*: same as before n e+: same as e e* n e? : option - was e 1 n 3/5/2021 34

Ocamllex Regular Expression n e 1 # e 2: the characters in e 1

Ocamllex Regular Expression n e 1 # e 2: the characters in e 1 but not in e 2; e 1 and e 2 must describe just sets of characters n ident: abbreviation for earlier reg exp in let ident = regexp n e 1 as id: binds the result of e 1 to id to be used in the associated action 3/5/2021 35

Ocamllex Manual n More details can be found at http: //caml. inria. fr/pub/docs/manualocaml/lexyacc. html

Ocamllex Manual n More details can be found at http: //caml. inria. fr/pub/docs/manualocaml/lexyacc. html 3/5/2021 36

Example : test. mll { type result = Int of int | Float of

Example : test. mll { type result = Int of int | Float of float | String of string } let digit = ['0'-'9'] let digits = digit + let lower_case = ['a'-'z'] let upper_case = ['A'-'Z'] letter = upper_case | lower_case letters = letter + 3/5/2021 37

Example : test. mll rule main = parse (digits)'. 'digits as f { Float

Example : test. mll rule main = parse (digits)'. 'digits as f { Float (float_of_string f) } | digits as n { Int (int_of_string n) } | letters as s { String s} | _ { main lexbuf } { let newlexbuf = (Lexing. from_channel stdin) in print_string "Ready to lex. "; print_newline (); main newlexbuf } 3/5/2021 38

Example # #use "test. ml"; ; … val main : Lexing. lexbuf -> result

Example # #use "test. ml"; ; … val main : Lexing. lexbuf -> result = <fun> val __ocaml_lex_main_rec : Lexing. lexbuf -> int -> result = <fun> Ready to lex. hi there 234 5. 2 - : result = String "hi" What happened to the rest? !? 3/5/2021 39

Example # let b = Lexing. from_channel stdin; ; # main b; ; hi

Example # let b = Lexing. from_channel stdin; ; # main b; ; hi 673 there - : result = String "hi" # main b; ; - : result = Int 673 # main b; ; - : result = String "there" 3/5/2021 40

Your Turn n Work on MP 4�� Add a few keywords n Implement booleans

Your Turn n Work on MP 4�� Add a few keywords n Implement booleans and unit n Implement Ints and Floats n Implement identifiers n 3/5/2021 41

Problem n n How to get lexer to look at more than the first

Problem n n How to get lexer to look at more than the first token at one time? Answer: action has to tell it to -- recursive calls Side Benefit: can add “state” into lexing Note: already used this with the _ case 3/5/2021 42

Example rule main = parse (digits) '. ' digits as f { Float (float_of_string

Example rule main = parse (digits) '. ' digits as f { Float (float_of_string f) : : main lexbuf} | digits as n { Int (int_of_string n) : : main lexbuf } | letters as s { String s : : main lexbuf} | eof { [] } |_ { main lexbuf } 3/5/2021 43

Example Results Ready to lex. hi there 234 5. 2 - : result list

Example Results Ready to lex. hi there 234 5. 2 - : result list = [String "hi"; String "there"; Int 234; Float 5. 2] # Used Ctrl-d to send the end-of-file signal 3/5/2021 44

Dealing with comments First Attempt let open_comment = "(*" let close_comment = "*)" rule

Dealing with comments First Attempt let open_comment = "(*" let close_comment = "*)" rule main = parse (digits) '. ' digits as f { Float (float_of_string f) : : main lexbuf} | digits as n { Int (int_of_string n) : : main lexbuf } | letters as s { String s : : main lexbuf} 3/5/2021 45

Dealing with comments | open_comment { comment lexbuf} | eof { [] } |

Dealing with comments | open_comment { comment lexbuf} | eof { [] } | _ { main lexbuf } and comment = parse close_comment { main lexbuf } |_ { comment lexbuf } 3/5/2021 46

Dealing with nested comments rule main = parse … | open_comment { comment 1

Dealing with nested comments rule main = parse … | open_comment { comment 1 lexbuf} | eof { [] } | _ { main lexbuf } and comment depth = parse open_comment { comment (depth+1) lexbuf } | close_comment { if depth = 1 then main lexbuf else comment (depth - 1) lexbuf } |_ { comment depth lexbuf } 3/5/2021 47

Dealing with nested comments rule main = parse (digits) '. ' digits as f

Dealing with nested comments rule main = parse (digits) '. ' digits as f { Float (float_of_string f) : : main lexbuf} | digits as n { Int (int_of_string n) : : main lexbuf } | letters as s { String s : : main lexbuf} | open_comment { (comment 1 lexbuf} | eof { [] } | _ { main lexbuf } 3/5/2021 48

Dealing with nested comments and comment depth = parse open_comment { comment (depth+1) lexbuf

Dealing with nested comments and comment depth = parse open_comment { comment (depth+1) lexbuf } | close_comment { if depth = 1 then main lexbuf else comment (depth - 1) lexbuf } |_ { comment depth lexbuf } 3/5/2021 49

Types of Formal Language Descriptions n n Regular expressions, regular grammars Context-free grammars, BNF

Types of Formal Language Descriptions n n Regular expressions, regular grammars Context-free grammars, BNF grammars, syntax diagrams Finite state automata Whole family more of grammars and automata – covered in automata theory 3/5/2021 50

Sample Grammar n n n Language: Parenthesized sums of 0’s and 1’s <Sum> :

Sample Grammar n n n Language: Parenthesized sums of 0’s and 1’s <Sum> : : = <Sum >: : = <Sum> : : = 3/5/2021 0 1 <Sum> + <Sum> (<Sum>) 51

BNF Grammars n Start with a set of characters, a, b, c, … n

BNF Grammars n Start with a set of characters, a, b, c, … n n Add a set of different characters, X, Y, Z, … n n We call these terminals We call these nonterminals One special nonterminal S called start symbol 3/5/2021 52

BNF Grammars n n BNF rules (aka productions) have form X : : =

BNF Grammars n n BNF rules (aka productions) have form X : : = y where X is any nonterminal and y is a string of terminals and nonterminals BNF grammar is a set of BNF rules such that every nonterminal appears on the left of some rule 3/5/2021 53

Sample Grammar n n n Terminals: 0 1 + ( ) Nonterminals: <Sum> Start

Sample Grammar n n n Terminals: 0 1 + ( ) Nonterminals: <Sum> Start symbol = <Sum> : : = 0 n <Sum >: : = 1 n <Sum> : : = <Sum> + <Sum> n <Sum> : : = (<Sum>) n Can be abbreviated as <Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) n 3/5/2021 54

BNF Deriviations Given rules X: : = y. Zw and Z: : =v we

BNF Deriviations Given rules X: : = y. Zw and Z: : =v we may replace Z by v to say X => y. Zw => yvw n Sequence of such replacements called n derivation n Derivation called right-most if always replace the right-most non-terminal 3/5/2021 55

BNF Derivations n Start with the start symbol: <Sum> => 3/5/2021 56

BNF Derivations n Start with the start symbol: <Sum> => 3/5/2021 56

BNF Derivations n Pick a non-terminal <Sum> => 3/5/2021 57

BNF Derivations n Pick a non-terminal <Sum> => 3/5/2021 57

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> +

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> + <Sum> => <Sum> + <Sum > n 3/5/2021 58

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > 3/5/2021 59

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > 3/5/2021 59

BNF Derivations Pick a rule and substitute: n <Sum> : : = ( <Sum>

BNF Derivations Pick a rule and substitute: n <Sum> : : = ( <Sum> ) <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> n 3/5/2021 60

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> 3/5/2021 61

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> +

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> + <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + <Sum> ) + <Sum> n 3/5/2021 62

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + <Sum> ) + <Sum> 3/5/2021 63

BNF Derivations Pick a rule and substitute: n <Sum >: : = 1 <Sum>

BNF Derivations Pick a rule and substitute: n <Sum >: : = 1 <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + <Sum> n 3/5/2021 64

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + <Sum> 3/5/2021 65

BNF Derivations Pick a rule and substitute: n <Sum >: : = 0 <Sum>

BNF Derivations Pick a rule and substitute: n <Sum >: : = 0 <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + 0 n 3/5/2021 66

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + 0 3/5/2021 67

BNF Derivations Pick a rule and substitute n <Sum> : : = 0 <Sum>

BNF Derivations Pick a rule and substitute n <Sum> : : = 0 <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) 0 => ( 0 + 1 ) + 0 n 3/5/2021 68

BNF Derivations n ( 0 + 1 ) + 0 is generated by grammar

BNF Derivations n ( 0 + 1 ) + 0 is generated by grammar <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + 0 => ( 0 + 1 ) + 0 3/5/2021 69

<Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) <Sum>

<Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) <Sum> => 3/5/2021 70