5 2 Compiler A compiler is a computer

  • Slides: 59
Download presentation

5. 2 컴파일러 일반적 구성 ¦ Compiler “A compiler is a computer program which

5. 2 컴파일러 일반적 구성 ¦ Compiler “A compiler is a computer program which translates programs written in a particular high-level programming language into executable code for a specific target computer. ” ex) C compiler on SPARC ¦ C program을 입력으로 받아 SPARC에서 수행 가능한 코드를 출력한다. [2/29]

¦ Compiler Structure Front-End : language dependent part ¦ Back-End : machine dependent part

¦ Compiler Structure Front-End : language dependent part ¦ Back-End : machine dependent part ¦ [3/29]

2. Syntax Analyzer(Parser) ¦ 기능: Syntax checking, Tree generation. ¦ 출력: incorrect - error

2. Syntax Analyzer(Parser) ¦ 기능: Syntax checking, Tree generation. ¦ 출력: incorrect - error message 출력 correct - program structure (=> tree 형태) 출력 ex) if (a > 10) a = 1; if > a = 10 a 1 Introduction to Compiler Design Theory [6/29]

3. Intermediate Code Generator ¦ Semantic checking ¦ Intermediate Code Generation ex) if (a

3. Intermediate Code Generator ¦ Semantic checking ¦ Intermediate Code Generation ex) if (a > 10) a = 1. 0; ☞ a가 정수일 때 semantic error ! ex) a = b + 1; Tree : = a + b Ucode: 1 lod 1 2 ldc 1 add str 1 1 - variable reference: (base, offset) [7/29]

4. Code Optimizer ¦ ¦ ¦ Optional phase 비효율적인 code를 구분해 내서 더 효율적인

4. Code Optimizer ¦ ¦ ¦ Optional phase 비효율적인 code를 구분해 내서 더 효율적인 code로 바꾸어 준다. Meaning of optimization ¦ ¦ ex) ¦ major part : improve running time minor part : reduce code size LDC R 1, 1 (x) Criteria for optimization ¦ ¦ ¦ preserve the program meanings speed up on average be worth the effort [8/29]

¦ Local optimization local inspection을 통하여 inefficient한 code들을 구분해 내서 좀 더 efficient한 code들로

¦ Local optimization local inspection을 통하여 inefficient한 code들을 구분해 내서 좀 더 efficient한 code들로 바꾸는 방법. ¦ 1. Constant folding 2. Eliminating redundant load, store instructions 3. Algebraic simplification 4. Strength reduction ¦ Global optimization ¦ flow analysis technique을 이용 1. Common subexpression 2. Moving loop invariants 3. Removing unreachable codes [9/29]

5. Target Code Generator ¦ ¦ 중간 코드로부터 machine instruction을 생성한다. Code generator tasks

5. Target Code Generator ¦ ¦ 중간 코드로부터 machine instruction을 생성한다. Code generator tasks 1. instruction selection & generation 2. register management 3. storage allocation 4. code optimization (Machine-dependent optimization) [10/29]

6. Error Recovery Error recovery - error가 다른 문장에 영향을 미치지 않도록 수정하는 것

6. Error Recovery Error recovery - error가 다른 문장에 영향을 미치지 않도록 수정하는 것 Error repair - error가 발생하면 복구해 주는 것 ¦ Error Handling ¦ ¦ ¦ Error detection Error recovery Error reporting Error repair Error ¦ ¦ ¦ Syntax Error Semantic Error Run-time Error [11/29]

5. 3 컴파일러 자동화 도구 ¦ Compiler Generating Tools (= Compiler-Compiler, Translator Writing System)

5. 3 컴파일러 자동화 도구 ¦ Compiler Generating Tools (= Compiler-Compiler, Translator Writing System) ¦ Language와 machine이 발달할 수록 많은 compiler가 필요. ¦ ¦ 새로운 언어를 개발하는 이유: 컴퓨터의 응용 분야가 넓어지므로. N개의 language를 M개의 컴퓨터에서 구현하려면 N*M개의 컴파일러가 필요. ex) 2개의 language : C, Java 3개의 Machine : IBM, SPARC, Pentium C-to-IBM, C-to-SPARC, C-to-Pentium Java-to-IBM, Java-to-SPARC, Java-to-Pentium [12/29]

¦ Compiler-compiler Model ¦ Language description은 grammar theory를 이용하고 있으나, Machine description은 정형화가 이루어져

¦ Compiler-compiler Model ¦ Language description은 grammar theory를 이용하고 있으나, Machine description은 정형화가 이루어져 있지 않은 상태임. ¦ ¦ HDL : Hardware Description Language Computer Architecture를 design하는 데 사용. Machine architecture와 programming language의 발전에 따라 automatic compiler generation이 연구됨. [13/29]

2. Parser Generator(PGS: Parser Generating System) (1) Stanford PGS ¦ John Hennessy 파스칼 언어로

2. Parser Generator(PGS: Parser Generating System) (1) Stanford PGS ¦ John Hennessy 파스칼 언어로 쓰여 있음 : 5000 lines 특징 : 구문 구조를 AST 형태로 얻음. ¦ Output : Abstract Syntax Tree(AST)의 정보를 포함한 파싱 테이블을 출력. ¦ ¦ [15/29]

(2) Wisconsin PGS ¦ ¦ ¦ C. N. Fisher 파스칼 언어로 쓰여 있음. :

(2) Wisconsin PGS ¦ ¦ ¦ C. N. Fisher 파스칼 언어로 쓰여 있음. : 10000 lines 특징 : error recovery (3) YACC(Yet Another Compiler) ¦ ¦ UNIX에서 수행. C language로 쓰여 있음. [16/29]

3. Automatic Code Generation ¦ ¦ Three aspects 1. Machine Description : ISP, ISPS,

3. Automatic Code Generation ¦ ¦ Three aspects 1. Machine Description : ISP, ISPS, HDL 2. Intermediate language 3. Code generating algorithm CGA Pattern matching code generation Table driven code generation [17/29]

4. Compiler System (1) PQCC(Production Quality Compiler System) ¦ W. A. Wulf(Carnegie-Mellon University) ¦

4. Compiler System (1) PQCC(Production Quality Compiler System) ¦ W. A. Wulf(Carnegie-Mellon University) ¦ input으로 language description과 target machine description을 받아 PQC(Production Quality Compiler)와 table이 output됨. ¦ 중간 언어로 tree구조인 TCOL을 사용. ¦ Pattern Matching Code Generation에 의해 code를 생성함. (2) ACK(Amsterdam Compiler Kit) ¦ Vrije 대학의 Andrew S. Tanenbaum을 중심으로 개발된 Compiler의 Back-End 자동화 도구. ¦ UNCOL 개념에서 출발(N*M=>N+M). ¦ EM이라는 Abstract Machine Code를 중간 언어로 사용. ¦ Portable Compiler를 만들기에 편리. [18/29]

¦ PQCC Model [19/29]

¦ PQCC Model [19/29]

¦ ACK Model [20/29]

¦ ACK Model [20/29]

5. 4 어휘 분석 ¦ Lexical Analysis ¦ the process by which the compiler

5. 4 어휘 분석 ¦ Lexical Analysis ¦ the process by which the compiler groups certain strings of characters into individual tokens. ¦ Lexical Analyzer Scanner Lexer [21/39]

¦ Token ¦ 문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal

¦ Token ¦ 문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value. ex) if ( a > Token Number : 32 7 4 Token Value : 0 0 ‘a’ 10 25 0 ). . . 5 10 8 0 [22/39]

¦ Token classes ¦ Special form - language designer 1. Keyword --- const, else,

¦ Token classes ¦ Special form - language designer 1. Keyword --- const, else, if, int, . . . 2. Operator symbols --- +, -, *, /, ++, -- etc. 3. Delimiters --- ; , , , (, ), [, ] etc. ¦ ¦ General form - programmer 4. identifier --- stk, ptr, sum, . . . 5. constant --- 526, 3. 0, 0. 1234 e-10, ‘c’, “string” etc. Token Structure - represented by regular expression. ex) id = (l + _)( l + d + _)* [23/39]

¦ Symbol table의 용도 ¦ ¦ ¦ L. A와 S. A시 identifier에 관한 정보를

¦ Symbol table의 용도 ¦ ¦ ¦ L. A와 S. A시 identifier에 관한 정보를 수집하여 저장. Semantic analysis와 Code generation시에 사용. name + attributes ex) Hashed symbol table - chapter 12 참조 [24/39]

5. 4. 2 토큰 인식 ¦ Specification of token structure Specification of PL ¦

5. 4. 2 토큰 인식 ¦ Specification of token structure Specification of PL ¦ Scanner design steps - RE - CFG 1. describe the structure of tokens in re. 2. or, directly design a transition diagram for the tokens. 3. and program a scanner according to the diagram. 4. moreover, we verify the scanner action through regular language theory. ¦ Character classification ¦ ¦ ¦ letter : a | b | c. . . | z | A | B | C |…| Z digit : 0 | 1 | 2. . . | 9 special character : + | - | * | / |. | , |. . . l d [25/39]

4. 2. 1 Identifier Recognition ¦ Transition diagram ¦ Regular grammar S l. A

4. 2. 1 Identifier Recognition ¦ Transition diagram ¦ Regular grammar S l. A | _A ¦ A l. A | d. A | _A | ε Regular expression S = l. A + _A = (l + _)A A = l. A + d. A + _A + ε = (l + d + _)* S = (l + _)( l + d + _)* [26/39]

4. 2. 2 Integer number Recognition ¦ Form : 10진수, 8진수, 16진수로 구분되어진다. 10진수

4. 2. 2 Integer number Recognition ¦ Form : 10진수, 8진수, 16진수로 구분되어진다. 10진수 : 0이 아닌 수 시작 8진수 : 0으로 시작, 16진수 : 0 x, 0 X로 시작 ¦ Transition diagram n : non-zero digit o : octal digit h : hexa digit [27/39]

¦ Regular grammar S n. A | 0 B C o. C | ε

¦ Regular grammar S n. A | 0 B C o. C | ε ¦ A d. A | ε B o. C | x. D | XD | ε D h. E E h. E | ε Regular expression E = h. E + ε = h* D = h. E = hh* = h + C = o. C + ε = o* B = o. C + x. D + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = d. A + ε = d* S = n. A + 0 B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 o+ + 0(x + X)h+ ∴ S =nd* + 0 o+ + 0(x + X)h+ [28/39]

6. 1 구문 분석 방법 ¦ How to check whether an input string is

6. 1 구문 분석 방법 ¦ How to check whether an input string is a sentence of a grammar and how to construct a parse tree for the string. Parsing : ¦ ? ∈L(G) A Parser for grammar G is a program that takes as input a string ω and produces as output either a parse tree(or derivation tree) for ω, if ω is a sentence of G, or an error message indicating that ω is not sentence of G. [30/28]

¦ Two basic types of parsers for context-free grammars ① Top down - starting

¦ Two basic types of parsers for context-free grammars ① Top down - starting with the root and working down to the leaves. recursive descent parser, predictive parser. ② Bottom up - beginning at the leaves and working up the root. precedence parser, shift-reduce parser. ex) A → XYZ A expand reduce bottom-up X “start symbol로” Y Z top-down “sentence로” [31/28]

5. 5. 2 구문 분석기의 출력 ¦ The output of a parser: ① Parse

5. 5. 2 구문 분석기의 출력 ¦ The output of a parser: ① Parse - left parse, right parse ② Parse tree ③ Abstract syntax tree ex) G : 1. E → E + T 2. E → T 3. T → T * F 4. T → F 5. F →(E) 6. F → a string : a + a * a [32/28]

¦ left parse : a sequence of production rule numbers applied in leftmost derivation.

¦ left parse : a sequence of production rule numbers applied in leftmost derivation. 1 2 6 3 6 6 E E+T a+a*F T+T a+T*F 4 4 F+T a+F*F a+a*a ∴ 12463466 ¦ right parse : reverse order of production rule numbers applied in rightmost derivation. 1 3 4 6 E E+T E+F*a F+a*a E+T*F E+a*a 6 2 E+T*a T+a*a a+a*a ∴ 64264631 [33/28]

¦ parse tree : derivation tree E string : a + a * a

¦ parse tree : derivation tree E string : a + a * a E + T T F a T * F F a a [34/28]

¦ Abstract Syntax Tree(AST) : : = a transformed parse tree that is a

¦ Abstract Syntax Tree(AST) : : = a transformed parse tree that is a more efficient representation of the source program. ¦ ¦ leaf node - operand(identifier or constant) internal node - operator(meaningful production rule name) ex) G: 1. E → E + T add 2. E → T 3. T → T * F mul 4. T → F 5. F → (E) 6. F → a string : a + a * a [35/28]

※ 의미 있는 terminal node ¦ 의미 있는 production rule nonterminal node → naming

※ 의미 있는 terminal node ¦ 의미 있는 production rule nonterminal node → naming : compiler designer가 지정. ex) if (a > b) a = b + 1; else a = b – 2; [36/28]

5. 5. 3 Top-Down 방법 : : = Beginning with the start symbol of

5. 5. 3 Top-Down 방법 : : = Beginning with the start symbol of the grammar, it attempts to produce a string of terminal symbol that is identical to a given source string. This matching process proceeds by successively applying the productions of the grammar to produce substrings from nonterminals. : : = In the terminology of trees, this is moving from the root of the tree to a set of leaves in the parse tree for a program. ¦ Top-Down parsing methods (1) Parsing with backup or backtracking. (2) Parsing with limited or partial backup. (3) Parsing with nobacktracking. ¦ backtracking : making repeated scans of the input. [37/28]

¦ General Top-Down Parsing method ¦ ¦ called a brute-force method with backtracking (

¦ General Top-Down Parsing method ¦ ¦ called a brute-force method with backtracking ( Top-Down parsing with full backup ) 1. Given a particular nonterminal that is to be expanded, the first production for this nonterminal is applied. 2. Compare the newly expanded string with the input string. In the matching process, terminal symbol is compared with an input symbol is selected for expansion and its first production is applied. 3. If the generated string does not match the input string, an incorrect expansion occurs. In the case of such an incorrect expansion this process is backed up by undoing the most recently applied production. And the next production of this nonterminal is used as next expansion. 4. This process continues either until the generated string becomes an input string or until there are no further productions to be tried. In the latter case, the given string cannot be generated from the grammar. [38/28]

¦ Several problems with top-down parsing method ¦ left recursion ¦ ¦ A nonterminal

¦ Several problems with top-down parsing method ¦ left recursion ¦ ¦ A nonterminal A is left recursive if A Aα for some α. A grammar G is left recursive if it has a left-recursive nonterminal. ⇒ A left-recursive grammar can cause a top down parser to go into an infinite loop. ∴ eliminate the left recursion. ¦ Backtracking ¦ ¦ the repeated scanning of input string. the speed of parsing is much slower. (very time consuming) ⇒ the conditions for nobacktracking FIRST, : FOLLOW을 FIRST FOLLOW 이용하여 formal하게 정의. Syntax Analysis [39/28]

¦ Elimination of left recursion ¦ ¦ direct left-recursion : A → Aα +

¦ Elimination of left recursion ¦ ¦ direct left-recursion : A → Aα + ∈P indirect left-recursion : A Aα A → Aα ┃ A = Aα + = α* ¦ general form : ¦ introducing new nonterminal A’ which generates α*. ==> A → A' A' → αA' ┃ε [40/28]

¦ Left-factoring ¦ if A → | are two A-productions and the input begins

¦ Left-factoring ¦ if A → | are two A-productions and the input begins with a non-empty string derived from , we do not know whether to expand A to or to . ==> left-factoring : the process of factoring out the common prefixes of alternates. method : A → | ==> A → ( | ) ==> A → A', A' → | ¦ ex) S → i. Ct. S | i. Ct. Se. S | a C→b [42/28]

S → i. Ct. S | i. Ct. Se. S | a → i.

S → i. Ct. S | i. Ct. Se. S | a → i. Ct. S( | e. S) | a ∴ S → i. Ct. SS' | a S' → | e. S C→b ¦ No-backtracking : : = deterministic selection of the production rule to be applied. [43/28]

5. 5. 4 Bottom-up 방법 : : = Reducing a given string to the

5. 5. 4 Bottom-up 방법 : : = Reducing a given string to the start symbol of the grammar. : : = It attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root(the top). ex) G: S → a. Ac. Be A → Ab | b B→d string : abbcde [44/28]

Reduce [Def 3. 1] reduce : the replacement of the right side of a

Reduce [Def 3. 1] reduce : the replacement of the right side of a production with the left side. S , *A → ∈ P rm S A * rm rm [Def 3. 2] handle : If S A * , then is a handle of . rm [Def 3. 3] handle pruning : S r 0 r 1 . . . rn-1 rn rn-1 rm rm =rn-2 . . . = S= rm = = “ reduce sequence ” ex) G : S → b. Ae A → a; A | a ω: ba; ae [45/28]

Shift-Reduce Parsing : : = a bottom-up style of parsing. ¦ Two problems for

Shift-Reduce Parsing : : = a bottom-up style of parsing. ¦ Two problems for automatic parsing 1. How to find a handle in a right sentential form. 2. What production to choose in case there is more than one production with the same right hand side. ====> grammar의 종류에 따라 방법이 결정되지만 handle를 유지하기 위하여 stack을 사용한다. [46/28]

¦ Four actions of a shift-reduce parser “Stack top과 current input symbol에 따라 파싱

¦ Four actions of a shift-reduce parser “Stack top과 current input symbol에 따라 파싱 테이블을 참조해서 action을 결정. ” 1. shift : the next input symbol is shifted to the top of the stack. 2. reduce : the handle is reduced to the left side of production. 3. accept : the parser announces successful completion of parsing. 4. error : the parser discovers that a syntax error has occurred and calls an error recovery routine. [47/28]

ex) G: E →E + T | T T →T F | F F

ex) G: E →E + T | T T →T F | F F → (E) | a STACK -------(1) $ (2) $a (3) $F (4) $T (5) $E (6) $E + (7) $E + a (8) $E + F (9) $E + T (10) $E + T (11) $E + T a (12) $E + T F (13) $E + T (14) $E string : a + a a INPUT ---------a+a a$ +a a$ a$ a$ $ $ ACTION ----------shift a reduce F→ a reduce T→ F reduce E→T shift + shift a reduce F→a reduce T→F shift a reduce F→a reduce T→T*F reduce E→E+T accept [48/28]

<< Thinking points >> 1. the handle will always eventually appear on top of

<< Thinking points >> 1. the handle will always eventually appear on top of the stack, never inside. ∵ rightmost derivation in reverse. stack에 있는 contents와 input에 남아 있는 string이 합해져서 right sentential form을 이룬다. 따라서 항상 stack의 top부분이 reduce된다. 2. How to make a parsing table for a given grammar. → 문법의 종류에 따라 Parsing table을 만드는 방법이 다르다. SLR(Simple LR) LALR(Look. Ahead LR) CLR(Canonical LR) [49/28]

¦ Constructing a Parse tree 1. shift : create a terminal node labeled the

¦ Constructing a Parse tree 1. shift : create a terminal node labeled the shifted symbol. 2. reduce : A → X 1 X 2. . . Xn. (1) A new node labeled A is created. (2) The X 1 X 2. . . Xn are made direct descendants of the new node. (3) If A → ε, then the parser merely creates a node labeled A with no descendants. ex) G : 1. LIST → LIST , ELEMENT 2. LIST → ELEMENT 3. ELEMENT → a string : a , a [50/28]

Step (1) (2) (3) (4) (5) (6) (7) (8) STACK $ $ $a $ELEMENT

Step (1) (2) (3) (4) (5) (6) (7) (8) STACK $ $ $a $ELEMENT $LIST , a $LIST , ELEMENT $LIST INPUT a, a$ a$ $ $ ACTION shift a reduce 3 reduce 2 shift , shift a reduce 3 reduce 1 $ accept PARSETREE Build Node Build Tree return that tree LIST list LIST ELEMENT [51/28]

LR Parser ¦ ¦ ¦ an efficient Bottom-up parser for a large and useful

LR Parser ¦ ¦ ¦ an efficient Bottom-up parser for a large and useful class of context-free grammars. the “L” stands for left-to-right scan of the input; the “R” for constructing a Rightmost derivation in reverse. The attractive reasons of LR parsers (1) LR parsers can be constructed for most programming languages. (2) LR parsing method is more general than LL parsing method. (3) LR parsers can detect syntactic errors as soon as possible. But, ¦ it is too much work to implement an LR parser by hand for a typical programming-language grammar. =====> Parser Generator [52/60]

Parser Generating Systems ¦ The driver routine is the same for all LR parsers;

Parser Generating Systems ¦ The driver routine is the same for all LR parsers; only the parsing table changes from one parser to another.

Three Methods ¦ The techniques for producing LR parsing tables ¦ Simple LR(SLR) -

Three Methods ¦ The techniques for producing LR parsing tables ¦ Simple LR(SLR) - LR(0) items, FOLLOW ¦ Canonical LR(CLR) - LR(1) items ¦ Lookahead LR(LALR) - ① LR(1) items ② LR(0), Lookahead

LR Parser의 구조 [1/3] ¦ LR parser ¦ Stack : S 0 X 1

LR Parser의 구조 [1/3] ¦ LR parser ¦ Stack : S 0 X 1 S 1 X 2 • • • Xm. Sm, where Si : state and Xi V. ¦ Configuration of an LR parser : (S 0 X 1 S 1 • • • Xm. Sm, aiai+1 • • • an$) stack contents unscanned input

LR Parser의 구조 [2/3] ¦ LR Parsing Table (ACTION table + GOTO table) ¦

LR Parser의 구조 [2/3] ¦ LR Parsing Table (ACTION table + GOTO table) ¦ The LR parsing algorithm : : = same as the shift-reduce parsing algorithm. ¦ Four Actions : ¦ shift ¦ reduce ¦ accept ¦ error

LR Parser의 구조 [3/3] 1. ACTION[Sm, ai] = shift S : : = (S

LR Parser의 구조 [3/3] 1. ACTION[Sm, ai] = shift S : : = (S 0 X 1 S 1 Xm. Sm, aiai+1 an$) (S 0 X 1 S 1 Xm. Smai. S, ai+1 an$) 2. ACTION[Sm, ai] = reduce A α and |α| = r : : = (S 0 X 1 S 1 Xm. Sm, aiai+1 an$) (S 0 X 1 S 1 Xm-r. Sm-r, aiai+1 an$), GOTO(Sm-r , A) = S (S 0 X 1 S 1 Xm-r. Sm-r. AS, aiai+1 an$) 3. ACTION [Sm, ai] = accept, parsing is completed. 4. ACTION [Sm, ai] = error, the parser has discovered an error and calls an error recovery routine.

LR 파싱 예제 1. LIST , ELEMENT 2. LIST ELEMENT 3. ELEMENT a ¦

LR 파싱 예제 1. LIST , ELEMENT 2. LIST ELEMENT 3. ELEMENT a ¦ G: ¦ Parsing Table : ( 이 파싱테이블 이용하여 a, a 의 파싱과정 보이기) where, sj means shift and stack state j, ri means reduce by production numbered i, acc means accept, and blank means error.

구문 분석기의 작성 ¦ Parser Generating System

구문 분석기의 작성 ¦ Parser Generating System