The LANCE V 2 0 C compiler system

  • Slides: 35
Download presentation
The LANCE V 2. 0 C compiler system Rainer Leupers phone: +49 (231) 755

The LANCE V 2. 0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd. de University of Dortmund, Informatik 12 44221 Dortmund, Germany fax: +49 (231) 755 6116 http: //ls 12 -www. cs. uni-dortmund

Overview l l l l Functionality of LANCE Software structure C frontend Intermediate representation

Overview l l l l Functionality of LANCE Software structure C frontend Intermediate representation (IR) IR optimizations Control and data flow analysis Backend interface © 2000, R. Leupers

The LANCE V 2. 0 compiler system Purpose of LANCE: § Facilitate C compiler

The LANCE V 2. 0 compiler system Purpose of LANCE: § Facilitate C compiler development for new target processors § Give insight into compiler structure Tasks covered by LANCE: § Source code analysis § Generation of IR § Machine-independent optimizations § Data flow graph generation Tasks not covered by LANCE: § Assembly code generation (backend) § Machine-specific optimizations § Code assembly and linking © 2000, R. Leupers

Key features l Full ANSI C coverage (C 89) l Modular tool and library

Key features l Full ANSI C coverage (C 89) l Modular tool and library structure l Simple three address code IR (C subset) l Plug & play IR optimizations l Backend interface compatible to OLIVE l Proven in numerous compiler projects © 2000, R. Leupers

LANCE software structure LANCE library LANCE tools header file IR optimization 1 lance 2.

LANCE software structure LANCE library LANCE tools header file IR optimization 1 lance 2. h C++ library liblance 2. a C frontend common IR used by IR optimization n machinespecific backend © 2000, R. Leupers

ANSI C frontend Functionality: § Lexical, syntactical, and semantical analysis of C source §

ANSI C frontend Functionality: § Lexical, syntactical, and semantical analysis of C source § Generation of three address code IR for a C file § Emission of error messages if required (gcc style) § Machine-specific constants (type bitwidth, alignment) stored in a configuration file Implementation: § Based on a context-free C grammar, according to K&R spec § C source automatically generated with attribute grammar compiling system (OX, extension of lex & yacc) § In total approx. 26, 000 lines of C source code § Validated with comprehensive test suite © 2000, R. Leupers

Setup and IR generation Environment variables: § setenv LANCE 2_CPP „gcc –E“ § setenv

Setup and IR generation Environment variables: § setenv LANCE 2_CPP „gcc –E“ § setenv LANCE 2_CONFIG „config. sparc“ file test. ir. c Call C frontend by „compile“ command: file test. c >compile test. c config. sparc © 2000, R. Leupers

General IR format § One IR file (*. ir. c) generated for each C

General IR format § One IR file (*. ir. c) generated for each C source file (*. c) § External IR format: C subset (compilable !) § Internal IR format: Accessible via LANCE library § IR contains a symbol table + three address code (3 AC) for each C function defined in the source code § 3 AC is a sequence of IR statements § 3 AC = at most two operands, one result per statement § IR statements (mostly) consist of IR expressions § blocks of 3 AC augmented with source information (C code, source line no. ) for debugging purposes © 2000, R. Leupers

Classes of IR statements § Assignment: a = b + c; § Jump: *p

Classes of IR statements § Assignment: a = b + c; § Jump: *p = !a; x = f(y, z); cond = *x; goto lab; § Conditional jump: if (cond) goto lab; § Label: lab: § Return void: return; § Return value: return x; © 2000, R. Leupers

Classes of IR expressions § Symbol: „a“, „b“, „main“, „count“, . . . §

Classes of IR expressions § Symbol: „a“, „b“, „main“, „count“, . . . § Binary expression: a * b, x / 2, 3 ^ v, f &4, q % r, . . . § Unary expression: !a, *p, ~x, -z, . . . § Function call: f 1(), f 2(a, b), f 3(*x, 1, y), . . . § Type cast: (char)z, (int)a, (float*)b, . . . § String constant: „compiler“, „design“, „is“, „fun“, . . . § Integer constant: 1000, 3456, -234, -112, . . . § Float constant: „ 3. 1415926536“, „ 2. 71828459“, . . . © 2000, R. Leupers

Why is the LANCE IR a C subset ? Validation of frontend (or any

Why is the LANCE IR a C subset ? Validation of frontend (or any IR optimization): frontend C source CC IR-C source exe 1 test input exe 2 output 1 =? output 2 CC C-to-C optimization: CC optimized C source IR optimization tools © 2000, R. Leupers

IR data structure overview function list fun 1 „name 1“. . . fun n

IR data structure overview function list fun 1 „name 1“. . . fun n „name n“ IR statement list stm 1 stm 2 . . Class: cond. jump ID: 4124 Target: „L 1“ stm info Condition: c Local symbol table int a, b, c; . . . GLOBAL SYMBOL TABLE int x 1, x 2, x 3; double y 1, y 2, y 3; . . . . stm m Class: assignment ID: 4123 Left hand side: *p Right hand side: a + b IR expression Class: binary ID: 10034 Left arg: a Right arg: b Oper: + exp info Type: int © 2000, R. Leupers

The IR type class § C++ class IRType stores type info for all symbols

The IR type class § C++ class IRType stores type info for all symbols and expressions § Primary type: void, char, short, int, array, pointer, struct, function, . . . § Secondary type: subtype of arrays and pointers § Storage class: extern, static, register, . . . § Qualifiers: const, volatile § Example: const int* A[100]; Type->Class() = IRTYPE_ARRAY // primary type Type->Is. Const() = true Type->Subtype()->Class() = IRTYPE_POINTER Type->Subtype()->Class() = IRTYPE_INT Type->Array. Dim() = 100 Type->Size. Of() = 400 // in bytes, for 32 -bit pointers Type->Memory. Words() = 200 // for a 16 -bit word memory © 2000, R. Leupers

The symbol table class § Symbol table stores all relevant information for symbols/identifiers §

The symbol table class § Symbol table stores all relevant information for symbols/identifiers § Two hierarchy levels: § Global symbol table § One local symbol table per function IR->Global. Symbol. Table() fun->Local. Symbol. Table() § All local symbols get a unique numerical suffix, e. g. int f(int x) { int a, b; } int f(int x_1) { int a_2, b_3; } § Important access methods: § ST->Lookup. Symbol(char* name) § IRSymbol* ST->Create. Symbol(IRType* tp) § Iterators: ST->First. Object(), ST->Next. Object() § Information stored in a table entry (class IRSymbol): § Symbol type: IRType* sym->Type() § Symbol name: char* sym->Name() © 2000, R. Leupers

IR generation example forward declaration automatic conversion suffix 3 for parameter i auxiliary vars

IR generation example forward declaration automatic conversion suffix 3 for parameter i auxiliary vars debug info source file IR file © 2000, R. Leupers

IR optimization tools § Purpose: perform machine-independent optimizations on IR § Identical IR format

IR optimization tools § Purpose: perform machine-independent optimizations on IR § Identical IR format for all tools, „plug & play“ concept § Currently available tools: § Constant folding cfold tool § Constant propagation constprop tool § Copy propagation copyprop tool § Common subexpression elimination cse tool § Dead code elimination dce tool § Jump optimization jmpopt tool § Loop invariant code motion licm tool § Induction variable elimination ive tool § Automatic iteration of IR optimizations via „iropt“ shell script © 2000, R. Leupers

IR optimization example C source code compile unoptimized IR © 2000, R. Leupers

IR optimization example C source code compile unoptimized IR © 2000, R. Leupers

Constant folding cfold © 2000, R. Leupers

Constant folding cfold © 2000, R. Leupers

Constant propagation constprop © 2000, R. Leupers

Constant propagation constprop © 2000, R. Leupers

Copy propagation copyprop © 2000, R. Leupers

Copy propagation copyprop © 2000, R. Leupers

Common subexpression elimination cse © 2000, R. Leupers

Common subexpression elimination cse © 2000, R. Leupers

Dead code elimination dce © 2000, R. Leupers

Dead code elimination dce © 2000, R. Leupers

Jump optimization jmpopt © 2000, R. Leupers

Jump optimization jmpopt © 2000, R. Leupers

Loop invariant code motion licm © 2000, R. Leupers

Loop invariant code motion licm © 2000, R. Leupers

Induction variable elimination ive © 2000, R. Leupers

Induction variable elimination ive © 2000, R. Leupers

Control flow analysis § Purpose: identify basic block structure of a C function §

Control flow analysis § Purpose: identify basic block structure of a C function § Basic block (BB): IR statement sequence with unique entry and exit points § Control flow graph (CFG): One node per BB, edge (BB 1, BB 2) iff BB 2 may be an immediate successor of BB 1 during execution § Assembly code generation usually done BB after BB § Example: BB 1 while (x) { BB 1; BB 2 BB 3 if (x) then BB 2; else BB 3; BB 4 } © 2000, R. Leupers

CFG generation by LANCE § Class Control. Flow. Graph contained in LANCE library §

CFG generation by LANCE § Class Control. Flow. Graph contained in LANCE library § Constructor Control. Flow. Graph(Function* fun) generates CFG for any function fun § LANCE tool showcfg exports CFGs in the VCG text format § VCG can be used to visualize generated CFGs showcfg IR file xvcg VCG file CFG © 2000, R. Leupers

CFG visualization example showcfg + VCG tool © 2000, R. Leupers

CFG visualization example showcfg + VCG tool © 2000, R. Leupers

Data flow analysis § Goal: convert IR into data flow graph (DFG) representation for

Data flow analysis § Goal: convert IR into data flow graph (DFG) representation for assembly code generation by tree pattern matching § Performed by def/use analysis between IR statements/expressions § LANCE lib class Data. Flow. Analysis provides required methods § Constructor Data. Flow. Analysis(Function* fun) constructs data flow information for any function fun x = 5; goto lab; § Example: . . . x = 6; x has two definitions: x and x lab: y has two uses: y and y y = x + 1; . . . z = 1 – y; u = y / 5; © 2000, R. Leupers

DFG visualization example showdfg + VCG tool © 2000, R. Leupers

DFG visualization example showdfg + VCG tool © 2000, R. Leupers

Backend interface § LANCE lib classes LANCEData. Flow. Tree and DFTManager provide link between

Backend interface § LANCE lib classes LANCEData. Flow. Tree and DFTManager provide link between LANCE IR and tree pattern matching § OLIVE/IBURG accept only trees instead of general DFGs § Hence: split DFGs at the common subexpressions (CSEs) a c b a CSE + + x y auxiliary variable * 2 * b t c t t 2 + + x y © 2000, R. Leupers

Data structure overview § Constructor DFTManager(Function* fun) generates data flow tree (DFT) representation for

Data structure overview § Constructor DFTManager(Function* fun) generates data flow tree (DFT) representation for an entire function fun § DFTManager contains internal list of basic blocks § Each BB in turn is a list of DFTs BB 1 DFT 2 . . DFT m BB 2. . . BB n © 2000, R. Leupers

DFT covering with OLIVE § DFTs are directly in the format required by code

DFT covering with OLIVE § DFTs are directly in the format required by code generators produced by OLIVE § All DFTs consist of a fixed set of terminal symbols (e. g. cs_STORE) (specified in file INCL/termlist. c) § Example (only a single DFT): C file DFT representation IR file © 2000, R. Leupers

Example (cont. ) DFT in OLIVE format simplified OLIVE spec assembly code for hypothetical

Example (cont. ) DFT in OLIVE format simplified OLIVE spec assembly code for hypothetical machine © 2000, R. Leupers

Summary LANCE provides you with. . . § C frontend § IR optimizations §

Summary LANCE provides you with. . . § C frontend § IR optimizations § C++ library for IR access (+ important basic classes) § interface to OLIVE data flow trees Full C compiler additionally requires. . . § OLIVE based backend for the concrete target machine § target-specific optimizations (e. g. scheduling, address gen. ) © 2000, R. Leupers