Languages and Compilers SProg og Oversttere Parsing 1

  • Slides: 21
Download presentation
Languages and Compilers (SProg og Oversættere) Parsing 1

Languages and Compilers (SProg og Oversættere) Parsing 1

Parsing – Describe the purpose of the parser – Discuss top down vs. bottom

Parsing – Describe the purpose of the parser – Discuss top down vs. bottom up parsing – Explain necessary conditions for construction of recursive decent parsers – Discuss the construction of an RD parser from a grammar 2

Top-down parsing Sentence Subject Verb Object Noun The cat . Noun sees a rat

Top-down parsing Sentence Subject Verb Object Noun The cat . Noun sees a rat . 3

Bottom up parsing Sentence Subject The Object Noun Verb cat sees Noun a rat

Bottom up parsing Sentence Subject The Object Noun Verb cat sees Noun a rat . 4

Top-Down vs Bottom-Up parsing LL-Analyse (Top-Down) LR-Analyse (Bottom-Up) Reduction Derivation Look-Ahead 5

Top-Down vs Bottom-Up parsing LL-Analyse (Top-Down) LR-Analyse (Bottom-Up) Reduction Derivation Look-Ahead 5

Development of Recursive Descent Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left

Development of Recursive Descent Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with – private variable current. Token – methods to call the scanner: accept and accept. It (4) Implement private parsing methods: – add private parse. N method for each non terminal N – public parse method that • gets the first token form the scanner • calls parse. S (S is the start symbol of the grammar) 6

Recursive Descent Parsing Sentence Subject Object Noun Verb : : = : : =

Recursive Descent Parsing Sentence Subject Object Noun Verb : : = : : = Subject Verb Object. I | a Noun | the Noun me | a Noun | the Noun cat | mat | rat like | is | sees Define a procedure parse. N for each non-terminal N private private void void parse. Sentence() ; parse. Subject(); parse. Object(); parse. Noun(); parse. Verb(); 7

Recursive Descent Parsing public class Micro. English. Parser { private Terminal. Symbol current. Terminal;

Recursive Descent Parsing public class Micro. English. Parser { private Terminal. Symbol current. Terminal; //Auxiliary methods will go here. . . //Parsing methods will go here. . . } 8

Recursive Descent Parsing: Auxiliary Methods public class Micro. English. Parser { private Terminal. Symbol

Recursive Descent Parsing: Auxiliary Methods public class Micro. English. Parser { private Terminal. Symbol current. Terminal private void accept(Terminal. Symbol expected) { if (current. Terminal matches expected) current. Terminal = next input terminal ; else report a syntax error }. . . } 9

Recursive Descent Parsing: Parsing Methods Sentence : : = Subject Verb Object. private void

Recursive Descent Parsing: Parsing Methods Sentence : : = Subject Verb Object. private void parse. Sentence() { parse. Subject(); parse. Verb(); parse. Object(); accept(‘. ’); } 10

Recursive Descent Parsing: Parsing Methods Subject : : = I | a Noun |

Recursive Descent Parsing: Parsing Methods Subject : : = I | a Noun | the Noun private void parse. Subject() { if (current. Terminal matches ‘I’) accept(‘I’); else if (current. Terminal matches ‘a’) { accept(‘a’); parse. Noun(); } else if (current. Terminal matches ‘the’) { accept(‘the’); parse. Noun(); } else report a syntax error } 11

Recursive Descent Parsing: Parsing Methods Noun : : = cat | mat | rat

Recursive Descent Parsing: Parsing Methods Noun : : = cat | mat | rat private void parse. Noun() { if (current. Terminal matches ‘cat’) accept(‘cat’); else if (current. Terminal matches ‘mat’) accept(‘mat’); else if (current. Terminal matches ‘rat’) accept(‘rat’); else report a syntax error } 12

LL 1 Grammars • The presented algorithm to convert EBNF into a parser does

LL 1 Grammars • The presented algorithm to convert EBNF into a parser does not work for all possible grammars. • It only works for so called “LL 1” grammars. • Basically, an LL 1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. • What grammars are LL 1? How can we recognize that a grammar is (or is not) LL 1? ÞWe can deduce the necessary conditions from the parser generation algorithm. ÞWe can use a formal definition 13

LL 1 Grammars parse X* while (current. Token. kind is in starters[X]) { parse

LL 1 Grammars parse X* while (current. Token. kind is in starters[X]) { parse X Condition: starters[X] } parse X|Y must be disjoint from the set of tokens that can immediately follow X * switch (current. Token. kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } Condition: starters[X] and starters[Y] must be disjoint sets. 14

Formal definition of LL(1) A grammar G is LL(1) iff for each set of

Formal definition of LL(1) A grammar G is LL(1) iff for each set of productions M : : = X 1 | X 2 | … | Xn : 1. starters[X 1], starters[X 2], …, starters[Xn] are all pairwise disjoint 2. If Xi =>* ε then starters[Xj]∩ follow[X]=Ø, for 1≤j≤ n. i≠j If G is ε-free then 1 is sufficient 15

Converting EBNF into RD parsers • The conversion of an EBNF specification into a

Converting EBNF into RD parsers • The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! => Java. CC “Java Compiler” 16

Java. CC and JJTree 17

Java. CC and JJTree 17

LR parsing – – The algorithm makes use of a stack. The first item

LR parsing – – The algorithm makes use of a stack. The first item on the stack is the initial state of a DFA A state of the automaton is a set of LR 0/LR 1 items. The initial state is constructed from productions of the form S: = • a [, $] (where S is the start symbol of the CFG) – The stack contains (in alternating) order: • A DFA state • A terminal symbol or part (subtree) of the parse tree being constructed – The items on the stack are related by transitions of the DFA – There are two basic actions in the algorithm: • shift: get next input token • reduce: build a new node (remove children from stack) 18

Java. CUP: A LALR generator for Java Definition of tokens Grammar BNF-like Specification Regular

Java. CUP: A LALR generator for Java Definition of tokens Grammar BNF-like Specification Regular Expressions JFlex Java. CUP Java File: Scanner Class Java File: Parser Class Recognizes Tokens Uses Scanner to get Tokens Parses Stream of Tokens Syntactic Analyzer 19

Steps to build a compiler with Sable. CC 1. 2. 3. 4. 5. Create

Steps to build a compiler with Sable. CC 1. 2. 3. 4. 5. Create a Sable. CC specification file Call Sable. CC Create one or more working classes, possibly inherited from classes generated by Sable. CC Create a Main class activating lexer, parser and working classes Compile with Javac 20

Hierarchy 21

Hierarchy 21