Programming Languages and Compilers CS 421 Grigore Rosu

  • Slides: 76
Download presentation
Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC http: //courses. engr.

Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC http: //courses. engr. illinois. edu/cs 421 Slides by Elsa Gunter, based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha 1/2/2022 1

Objective n n n Finish the discussion on lexing Ocamllex (takes. mll files and

Objective n n n Finish the discussion on lexing Ocamllex (takes. mll files and generates. ml) Context-Free Grammars and BNF 1/2/2022 2

General Input for. mll Files { header } let ident = regexp. . .

General Input for. mll Files { header } let ident = regexp. . . rule entrypoint [arg 1. . . argn] = parse regexp { action } |. . . | regexp { action } and entrypoint [arg 1. . . argn] = parse. . . and. . . { trailer } 1/2/2022 3

Ocamllex Input n n header and trailer contain arbitrary ocaml code put at top

Ocamllex Input n n header and trailer contain arbitrary ocaml code put at top an bottom of <filename>. ml let ident = regexp. . . Introduces ident for use in later regular expressions 1/2/2022 4

Ocamllex Input n <filename>. ml contains one lexing function per entrypoint n n n

Ocamllex Input n <filename>. ml contains one lexing function per entrypoint n n n Name of function is name given for entrypoint Each entry point becomes an Ocaml function that takes n +1 arguments, the extra implicit last argument being of type Lexing. lexbuf arg 1. . . argn are for use in action 1/2/2022 5

Ocamllex Regular Expression Single quoted characters for letters: ‘a’ n _: (underscore) matches any

Ocamllex Regular Expression Single quoted characters for letters: ‘a’ n _: (underscore) matches any letter n Eof: special “end_of_file” marker n Concatenation same as usual n “string”: concatenation of sequence of characters n e 1 | e 2 : choice - what was e 1 e 2 n 1/2/2022 6

Ocamllex Regular Expression [c 1 - c 2]: choice of any character between first

Ocamllex Regular Expression [c 1 - c 2]: choice of any character between first and second inclusive, as determined by character codes n [^c 1 - c 2]: choice of any character NOT in set n e*: same as before n e+: same as e e* n e? : option - was e 1 n 1/2/2022 7

Ocamllex Regular Expression n e 1 # e 2: the characters in e 1

Ocamllex Regular Expression n e 1 # e 2: the characters in e 1 but not in e 2; e 1 and e 2 must describe just sets of characters n ident: abbreviation for earlier reg exp in let ident = regexp n e 1 as id: binds the result of e 1 to id to be used in the associated action 1/2/2022 8

Ocamllex Manual n More details can be found at http: //caml. inria. fr/pub/docs/manualocaml/lexyacc. html

Ocamllex Manual n More details can be found at http: //caml. inria. fr/pub/docs/manualocaml/lexyacc. html 1/2/2022 9

Example : test. mll { type result = Int of int | Float of

Example : test. mll { type result = Int of int | Float of float | String of string } let digit = ['0'-'9'] let digits = digit + let lower_case = ['a'-'z'] let upper_case = ['A'-'Z'] letter = upper_case | lower_case letters = letter + 1/2/2022 10

Example : test. mll rule main = parse (digits)'. 'digits as f { Float

Example : test. mll rule main = parse (digits)'. 'digits as f { Float (float_of_string f) } | digits as n { Int (int_of_string n) } | letters as s { String s} | _ { main lexbuf } { let newlexbuf = (Lexing. from_channel stdin) in print_string "Ready to lex. "; print_newline (); main newlexbuf } 1/2/2022 11

Example #use "test. ml"; ; … val main : Lexing. lexbuf -> result =

Example #use "test. ml"; ; … val main : Lexing. lexbuf -> result = <fun> val __ocaml_lex_main_rec : Lexing. lexbuf -> int -> result = <fun> Ready to lex. hi there 234 5. 2 - : result = String "hi" What happened to the rest? !? 1/2/2022 12

Example # let b = Lexing. from_channel stdin; ; # main b; ; hi

Example # let b = Lexing. from_channel stdin; ; # main b; ; hi 673 there - : result = String "hi" # main b; ; - : result = Int 673 # main b; ; - : result = String "there" 1/2/2022 13

Problem n n How to get lexer to look at more than the first

Problem n n How to get lexer to look at more than the first token at one time? Answer: action has to tell it to -- recursive calls Side Benefit: can add “state” into lexing Note: already used this with the _ case 1/2/2022 14

Example rule main = parse (digits) '. ' digits as f { Float (float_of_string

Example rule main = parse (digits) '. ' digits as f { Float (float_of_string f) : : main lexbuf} | digits as n { Int (int_of_string n) : : main lexbuf } | letters as s { String s : : main lexbuf} | eof { [] } |_ { main lexbuf } 1/2/2022 15

Example Results Ready to lex. hi there 234 5. 2 - : result list

Example Results Ready to lex. hi there 234 5. 2 - : result list = [String "hi"; String "there"; Int 234; Float 5. 2] # Used Ctrl-d to send the end-of-file signal 1/2/2022 16

Dealing with comments First Attempt let open_comment = "(*" let close_comment = "*)" rule

Dealing with comments First Attempt let open_comment = "(*" let close_comment = "*)" rule main = parse (digits) '. ' digits as f { Float (float_of_string f) : : main lexbuf} | digits as n { Int (int_of_string n) : : main lexbuf } | letters as s { String s : : main lexbuf} 1/2/2022 17

Dealing with comments | open_comment { comment lexbuf} | eof { [] } |

Dealing with comments | open_comment { comment lexbuf} | eof { [] } | _ { main lexbuf } and comment = parse close_comment { main lexbuf } |_ { comment lexbuf } 1/2/2022 18

Dealing with nested comments rule main = parse … | open_comment { comment 1

Dealing with nested comments rule main = parse … | open_comment { comment 1 lexbuf} | eof { [] } | _ { main lexbuf } and comment depth = parse open_comment { comment (depth+1) lexbuf } | close_comment { if depth = 1 then main lexbuf else comment (depth - 1) lexbuf } |_ { comment depth lexbuf } 1/2/2022 19

Dealing with nested comments rule main = parse (digits) '. ' digits as f

Dealing with nested comments rule main = parse (digits) '. ' digits as f { Float (float_of_string f) : : main lexbuf} | digits as n { Int (int_of_string n) : : main lexbuf } | letters as s { String s : : main lexbuf} | open_comment { (comment 1 lexbuf} | eof { [] } | _ { main lexbuf } 1/2/2022 20

Dealing with nested comments and comment depth = parse open_comment { comment (depth+1) lexbuf

Dealing with nested comments and comment depth = parse open_comment { comment (depth+1) lexbuf } | close_comment { if depth = 1 then main lexbuf else comment (depth - 1) lexbuf } |_ { comment depth lexbuf } 1/2/2022 21

Types of Formal Language Descriptions n n Regular expressions, regular grammars Context-free grammars, BNF

Types of Formal Language Descriptions n n Regular expressions, regular grammars Context-free grammars, BNF grammars, syntax diagrams Finite state automata Whole family more of grammars and automata – covered in automata theory 1/2/2022 22

Sample Grammar n n n Language: Parenthesized sums of 0’s and 1’s <Sum> :

Sample Grammar n n n Language: Parenthesized sums of 0’s and 1’s <Sum> : : = <Sum >: : = <Sum> : : = 1/2/2022 0 1 <Sum> + <Sum> (<Sum>) 23

BNF Grammars n Start with a set of characters, a, b, c, … n

BNF Grammars n Start with a set of characters, a, b, c, … n n Add a set of different characters, X, Y, Z, … n n We call these terminals We call these nonterminals One special nonterminal S called start symbol 1/2/2022 24

BNF Grammars n n BNF rules (aka productions) have form X : : =

BNF Grammars n n BNF rules (aka productions) have form X : : = y where X is any nonterminal and y is a string of terminals and nonterminals BNF grammar is a set of BNF rules such that every nonterminal appears on the left of some rule 1/2/2022 25

Sample Grammar n n n Terminals: 0 1 + ( ) Nonterminals: <Sum> Start

Sample Grammar n n n Terminals: 0 1 + ( ) Nonterminals: <Sum> Start symbol = <Sum> : : = 0 n <Sum >: : = 1 n <Sum> : : = <Sum> + <Sum> n <Sum> : : = (<Sum>) n Can be abbreviated as <Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) n 1/2/2022 26

BNF Deriviations Given rules X: : = y Zw and Z : : =v

BNF Deriviations Given rules X: : = y Zw and Z : : =v we may replace Z by v to say X => y Zw => yvw n Sequence of such replacements called n derivation n Derivation called right-most if always replace the right-most non-terminal 1/2/2022 27

BNF Derivations n Start with the start symbol: <Sum> => 1/2/2022 28

BNF Derivations n Start with the start symbol: <Sum> => 1/2/2022 28

BNF Derivations n Pick a non-terminal <Sum> => 1/2/2022 29

BNF Derivations n Pick a non-terminal <Sum> => 1/2/2022 29

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> +

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> + <Sum> => <Sum> + <Sum > n 1/2/2022 30

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > 1/2/2022 31

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > 1/2/2022 31

BNF Derivations Pick a rule and substitute: n <Sum> : : = ( <Sum>

BNF Derivations Pick a rule and substitute: n <Sum> : : = ( <Sum> ) <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> n 1/2/2022 32

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> 1/2/2022 33

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> +

BNF Derivations Pick a rule and substitute: n <Sum> : : = <Sum> + <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + <Sum> ) + <Sum> n 1/2/2022 34

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + <Sum> ) + <Sum> 1/2/2022 35

BNF Derivations Pick a rule and substitute: n <Sum >: : = 1 <Sum>

BNF Derivations Pick a rule and substitute: n <Sum >: : = 1 <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + <Sum> n 1/2/2022 36

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + <Sum> 1/2/2022 37

BNF Derivations Pick a rule and substitute: n <Sum >: : = 0 <Sum>

BNF Derivations Pick a rule and substitute: n <Sum >: : = 0 <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + 0 n 1/2/2022 38

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => (

BNF Derivations n Pick a non-terminal: <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + 0 1/2/2022 39

BNF Derivations Pick a rule and substitute n <Sum> : : = 0 <Sum>

BNF Derivations Pick a rule and substitute n <Sum> : : = 0 <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) 0 => ( 0 + 1 ) + 0 n 1/2/2022 40

BNF Derivations n ( 0 + 1 ) + 0 is generated by grammar

BNF Derivations n ( 0 + 1 ) + 0 is generated by grammar <Sum> => <Sum> + <Sum > => ( <Sum> ) + <Sum> => ( <Sum> + 1 ) + 0 => ( 0 + 1 ) + 0 1/2/2022 41

<Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) <Sum>

<Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) <Sum> => 1/2/2022 42

BNF Semantics n The meaning of a BNF grammar is the set of all

BNF Semantics n The meaning of a BNF grammar is the set of all strings consisting only of terminals that can be derived from the Start symbol 1/2/2022 43

Regular Grammars n n Subclass of BNF Only rules of form <nonterminal>: : =<terminal><nonterminal>

Regular Grammars n n Subclass of BNF Only rules of form <nonterminal>: : =<terminal><nonterminal> or <nonterminal>: : =<terminal> or <nonterminal>: : =ε Defines same class of languages as regular expressions Important for writing lexers (programs that convert strings of characters into strings of tokens) 1/2/2022 44

Example n n Regular grammar: <Balanced> : : = 0 <One. And. More> <Balanced>

Example n n Regular grammar: <Balanced> : : = 0 <One. And. More> <Balanced> : : = 1 <Zero. And. More> <One. And. More> : : = 1 <Balanced> <Zero. And. More> : : = 0 <Balanced> Generates even length strings where every initial substring of even length has same number of 0’s as 1’s 1/2/2022 45

Extended BNF Grammars n n n Alternatives: allow rules of from X : :

Extended BNF Grammars n n n Alternatives: allow rules of from X : : = y | z n Abbreviates X : : = y, X : : = z Options: X : : = y [v ] z n Abbreviates X : : = yvz, X : : = yz Repetition: X : : =y {v }* z n Can be eliminated by adding new nonterminal V and rules X : : =yz X : : = y. Vz V : : = v V 1/2/2022 46

Parse Trees n n Graphical representation of derivation Each node labeled with either non-terminal

Parse Trees n n Graphical representation of derivation Each node labeled with either non-terminal or terminal If node is labeled with a terminal, then it is a leaf (no sub-trees) If node is labeled with a non-terminal, then it has one branch for each character in the righthand side of rule used to substitute for it 1/2/2022 47

Example n n Consider grammar: <exp> : : = <factor> | <factor> + <factor>

Example n n Consider grammar: <exp> : : = <factor> | <factor> + <factor> : : = <bin> | <bin> * <exp> <bin> : : = 0 | 1 Problem: Build parse tree for 1 * 1 + 0 as an <exp> 1/2/2022 48

Example cont. n 1 * 1 + 0: <exp> is the start symbol for

Example cont. n 1 * 1 + 0: <exp> is the start symbol for this parse tree 1/2/2022 49

Example cont. n 1 * 1 + 0: <exp> <factor> Use rule: <exp> :

Example cont. n 1 * 1 + 0: <exp> <factor> Use rule: <exp> : : = <factor> 1/2/2022 50

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> * <exp> Use

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> * <exp> Use rule: <factor> : : = <bin> * <exp> 1/2/2022 51

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp>

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp> <factor> + <factor> Use rules: <bin> : : = 1 and <exp> : : = <factor> + <factor> 1/2/2022 52

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp>

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp> <factor> + <bin> <factor> <bin> Use rule: <factor> : : = <bin> 1/2/2022 53

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp>

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp> <factor> + <bin> 1 Use rules: <bin> : : = 1 | 0 1/2/2022 <factor> <bin> 0 54

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp>

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 * <exp> <factor> + <factor> <bin> 1 0 Fringe of tree is string generated by grammar 1/2/2022 55

Your Turn: 1 * 0 + 0 * 1 1/2/2022 56

Your Turn: 1 * 0 + 0 * 1 1/2/2022 56

Parse Tree Data Structures n n Parse trees may be represented by OCaml datatypes

Parse Tree Data Structures n n Parse trees may be represented by OCaml datatypes One datatype for each nonterminal One constructor for each rule Defined as mutually recursive collection of datatype declarations 1/2/2022 57

Example n n Recall grammar: <exp> : : = <factor> | <factor> + <factor>

Example n n Recall grammar: <exp> : : = <factor> | <factor> + <factor> : : = <bin> | <bin> * <exp> <bin> : : = 0 | 1 type exp = Factor 2 Exp of factor | Plus of factor * factor and factor = Bin 2 Factor of bin | Mult of bin * exp and bin = Zero | One 1/2/2022 58

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 1/2/2022 *

Example cont. n 1 * 1 + 0: <exp> <factor> <bin> 1 1/2/2022 * <exp> <factor> + <factor> <bin> 1 0 59

Example cont. n Can be represented as Factor 2 Exp (Mult(One, Plus(Bin 2 Factor

Example cont. n Can be represented as Factor 2 Exp (Mult(One, Plus(Bin 2 Factor One, Bin 2 Factor Zero))) 1/2/2022 60

Ambiguous Grammars and Languages n n A BNF grammar is ambiguous if its language

Ambiguous Grammars and Languages n n A BNF grammar is ambiguous if its language contains strings for which there is more than one parse tree If all BNF’s for a language are ambiguous then the language is inherently ambiguous 1/2/2022 61

Example: Ambiguous Grammar n 0+1+0 <Sum> <Sum> + <Sum> 0 0 1/2/2022 1 0

Example: Ambiguous Grammar n 0+1+0 <Sum> <Sum> + <Sum> 0 0 1/2/2022 1 0 <Sum> + <Sum> 1 0 62

Example n What is the result for: 3+4*5+6 1/2/2022 63

Example n What is the result for: 3+4*5+6 1/2/2022 63

Example What is the result for: 3+4*5+6 n Possible answers: n n n 1/2/2022

Example What is the result for: 3+4*5+6 n Possible answers: n n n 1/2/2022 41 = ((3 + 4) * 5) + 6 47 = 3 + (4 * (5 + 6)) 29 = (3 + (4 * 5)) + 6 = 3 + ((4 * 5) + 6) 77 = (3 + 4) * (5 + 6) 64

Example n What is the value of: 7– 5– 2 1/2/2022 65

Example n What is the value of: 7– 5– 2 1/2/2022 65

Example n n What is the value of: 7– 5– 2 Possible answers: In

Example n n What is the value of: 7– 5– 2 Possible answers: In Pascal, C++, SML assoc. left 7 – 5 – 2 = (7 – 5) – 2 = 0 n In APL, associate to right 7 – 5 – 2 = 7 – (5 – 2) = 4 n 1/2/2022 66

Two Major Sources of Ambiguity Lack of determination of operator precedence n Lack of

Two Major Sources of Ambiguity Lack of determination of operator precedence n Lack of determination of operator associativity n n Not the only sources of ambiguity 1/2/2022 67

Disambiguating a Grammar n n n Given ambiguous grammar G, with start symbol S,

Disambiguating a Grammar n n n Given ambiguous grammar G, with start symbol S, find a grammar G’ with same start symbol, such that language of G = language of G’ Not always possible No algorithm in general 1/2/2022 68

Disambiguating a Grammar n n n Idea: Each non-terminal represents all strings having some

Disambiguating a Grammar n n n Idea: Each non-terminal represents all strings having some property Identify these properties (often in terms of things that can’t happen) Use these properties to inductively guarantee every string in language has a unique parse 1/2/2022 69

Example n Ambiguous grammar: <exp> : : = 0 | 1 | <exp> +

Example n Ambiguous grammar: <exp> : : = 0 | 1 | <exp> + <exp> | <exp> * <exp> String with more then one parse: 0+1+0 1*1+1 n Source of ambiguity: associativity and precedence n 1/2/2022 70

How to Enforce Associativity n n Have at most one recursive call per production

How to Enforce Associativity n n Have at most one recursive call per production When two or more recursive calls would be natural leave right-most one for right associativity, left-most one for left associativity 10/4/07 71

Example <Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>)

Example <Sum> : : = 0 | 1 | <Sum> + <Sum> | (<Sum>) n Becomes n <Sum> : : = <Num> | <Num> + <Sum> n <Num> : : = 0 | 1 | (<Sum>) n 10/4/07 72

Operator Precedence n n n Operators of highest precedence evaluated first (bind more tightly).

Operator Precedence n n n Operators of highest precedence evaluated first (bind more tightly). Precedence for infix binary operators given in following table Needs to be reflected in grammar 10/4/07 73

Precedence Table - Sample Fortan Pascal C/C++ Ada highest ** *, / +, 10/4/07

Precedence Table - Sample Fortan Pascal C/C++ Ada highest ** *, / +, 10/4/07 *, /, div, mod +, - ++, -- ** *, /, % *, /, mod +, +, - SML div, mod, /, * +, -, ^ : : 74

First Example Again In any above language, 3 + 4 * 5 + 6

First Example Again In any above language, 3 + 4 * 5 + 6 = 29 n In APL, all infix operators have same precedence n n n Thus we still don’t know what the value is (handled by associativity) How do we handle precedence in grammar? 10/4/07 75

Predence in Grammar Higher precedence translates to longer derivation chain n Example: <exp> :

Predence in Grammar Higher precedence translates to longer derivation chain n Example: <exp> : : = 0 | 1 | <exp> + <exp> | <exp> * <exp> n Becomes <exp> : : = <mult_exp> | <exp> + <mult_exp> : : = <id> | <mult_exp> * <id> : : = 0 | 1 n 10/4/07 76