MachineLevel Representation of Programs I 1 Outline Compiler
- Slides: 43
Machine-Level Representation of Programs I 1
Outline • • • Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory and Registers Addressing Mode Data Formats • Suggested reading – Chap 1. 2, 1. 4. 1, 1. 7. 3, 3. 1, 3. 2, 3. 3, 3. 4. 1 2
The Hello Program • It begins life as a high-level C program – Can be read and understand by human beings • The individual C statements must be translated by compiler drivers – So that the hello program can run on a computer system – Compiler:编译器 3
The Hello Program • The C programs are translated into – A sequence of low-level machine-language instructions • These instructions are then packaged in a form – called an object program • Object program are stored as a binary disk file – Also referred to as executable object files 4
The Context of a Compiler (gcc) Figure 1. 3 P 5 hello. c Source program (text) Preprocessor (cpp) hello. i Modified source program (text) Compiler (cc 1) hello. s Assembly program (text) Assembler (as) hello. o Relocatable object program (binary) Linker (ld) Compiler: 编译器 Assembler: 汇编器 Linker: 连接器 hello Executable object program (binary) 5
Characteristics of the high level programming languages • Abstraction – Productive – reliable • Type checking • As efficient as hand written code • Can be compiled and executed on a number of different machines, whereas assembly code is highly machine specific Productive:多产的 Reliable: 可靠的 6
Characteristics of the assembly programming languages • Managing memory • Low level instructions to carry out the computation • Highly machine specific 7
Why should we understand the assembly code • Understand the optimization capabilities of the compiler • Analyze the underlying inefficiencies in the code • Sometimes the run-time behavior of a program is needed 8
From writing assembly code to understand assembly code • Different set of skills – Transformations – Relation between source code and assembly code • Reverse engineering – Trying to understand the process by which a system was created • By studying the system and • By working backward Backward: 回溯 9
A Historical Perspective • Long evolutionary development – Started from rather primitive 16 -bit processors – Added more features • Take the advantage of the technology improvements • Satisfy the demands for higher performance and for supporting more advanced operating systems – Laden with features providing backward compatibility that are obsolete * laden with: 承载 * compatibility: 兼容性 * obsolete: 陈旧的 10
X 86 family • 8086(1978, 29 K) – The heart of the IBM PC & DOS – 1 M bytes addressable, 640 K for users • 80286(1982, 134 K) – More (now obsolete) addressing modes – Basis of the IBM PC-AT & Windows 11
X 86 family • i 386(1985, 275 K) – 32 bits architecture, flat addressing model – Support a Unix operating system • I 486(1989, 1. 9 M) – Integrated the floating-point unit onto the processor chip 12
X 86 family • Pentium(1993, 3. 1 M) • Pentium. Pro(1995, 6. 5 M) – P 6 microarchitecture – Conditional mov • Pentium/MMX(1997, 4. 5 M) – New class of instructions for manipulating vectors of integers 13
X 86 family • Pentium II(1997, 7 M) – Implementing MMX instructions within P 6 • Pentium III(1999, 8. 2 M) – New class of instructions for manipulating vectors of floating-point numbers(SSE, Stream SIMD Extension) 14
X 86 family • Pentium 4(2001, 42 M) – Netburst microarchitecture – 144 new SSE 2 instructions 15
X 86 family • Advanced Micro Devices (AMD) – Now are close competitors to Intel – Developing own extension to 64 -bits 16
X 86 family • Transmeta – In January of 2002, introduced Crucoe. TM processor – Radically different approach to implementation • Translates x 86 code into “Very Long Instruction Word” (VLIW) code • High degree of parallelism – Shooting for low-power market such as lap-top computers 17
Hardware Organization • CPU: Central Processing Unit • ALU: Arithmetic/Logic Unit • PC: Program Counter • USB: Universal Serial Bus Figure 1. 4 P 7 18
Virtual spaces • A linear array of bytes – each with its own unique address (array index) starting at zero 0 xfffffffe contents addresses … … 0 x 2 0 x 1 0 x 0 19
Data layout • Object model in C – Different data types can be declared 20
Data layout • Object model in assembly – A large, byte-addressable array – No distinctions even between signed or unsigned integers – Code, user data, OS data – Run-time stack for managing procedure call and return – Blocks of memory allocated by user 21
• Figure 1. 13 22 P 17
Operations in C constructs • Arithmetic expression evaluation • Loops • Procedure calls and returns • Translated into sequences of instructions 23
Operations in Assembly Instructions • Performs only a very elementary operation • Normally one by one in sequential • Operate data stored in registers • Transfer data between memory and a register • Conditionally branch to a new instruction address 24
Assembly Programmer’s View Figure 3. 2 P 136 %eax %ah %al %edx %dh %dl %ecx %ch %cl %ebx %bh %bl Addresses Data FF C 0 BF Stack %esi %edi Instructions 80 7 F %esp %ebp %eip %eflag 40 3 F 08 00 Heap DLLs Heap Data Text 25
Programmer-Visible States P 129 • Program Counter(%eip) – Address of the next instruction • Register File – Heavily used program data – Integer and floating-point 26
Programmer-Visible States • Conditional code register – Hold status information about the most recently executed instruction – Implement conditional changes in the control flow 27
Code Examples P 130 C code int sum(int x, int y) { int t = x+y; return t; } Obtain with command _sum: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax movl %ebp, %esp popl %ebp ret gcc –O 2 -S code. c Assembly file code. s 28
Code Examples P 131 55 89 e 5 8 b 45 0 c 03 45 08 01 05 00 00 89 ec 5 d c 3 Obtain with command gcc –O 2 -c code. c Relocatable object file code. o 29
Code Examples Obtain with command objdump -d code. o Disassembly output (P 132 反汇编输出) 0 x 80483 b 4 <sum>: 0 x 80483 b 4 55 0 x 80483 b 5 89 e 5 0 x 80483 b 7 8 b 45 0 c 0 x 80483 ba 03 45 08 0 x 80483 bd 01 05 00 00 0 x 80483 c 3 89 ec 0 x 80483 c 5 5 d 0 x 80483 c 6 c 3 push mov add pop ret nop %ebp %esp, %ebp 0 xc(%ebp), %eax 0 x 8(%ebp), %eax %ebp, %esp %eax, 0 x 0 %ebp 30
C Code • Add two signed integers • int t = x+y; 31
Assembly Code • Operands: – x: – y: – t: Register Memory Register %eax M[%ebp+8] %eax • Instruction – addl 8(%ebp), %eax – Add 2 4 -byte integers – Similar to expression x +=y • Return function value in %eax 32
Object Code • 3 -byte instruction • Stored at address 0 x 80483 b 7 • 0 x 80483 b 7: 03 45 08 33
Operands P 137 • In high level languages – Either constants (常数) – Or variable (变量) – A=A+4 va ri ab le • Example constant 34
Operands • Counterparts in assembly languages – Immediate ( constant ) – Register ( variable ) – Memory ( variable ) memory • Example movl 8(%ebp), %eax register addl $4, %eax immediate 35
Simple Addressing Mode • Immediate – represents a constant – The format is $imm ($4, $0 xffff) • Registers – The fastest storage units in computer systems – Typically 32 -bit long – Register mode Ea • The value stored in the register • Noted as R[Ea] 36
Virtual spaces • A linear array of bytes – each with its own unique address (array index) starting at zero 0 xfffffffe contents addresses … … 0 x 2 0 x 1 0 x 0 37
Memory References • The name of the array is annotated as M • If addr is a memory address • M[addr] is the content of the memory starting at addr • addr is used as an array index • How many bytes are there in M[addr]? – It depends on the context 38
Memory Addressing Mode • An expression for – a memory address (or an array index) • Most general form – imm (Eb, Ei, s) – s: 1, 2, 4, 8 • The address represented by the above form – imm + R[Eb] + R[Ei] * s • It gives the value – M[imm + R[Eb] + R[Ei] * s] 39
Addressing Mode Type Form Figure 3. 3 P 137 Operand value Name Immediate $Imm Immediate Register Ea R[Ea] Register Memory Imm M[Imm] Absolute Memory (Ea) M[R[Ea]] Indirect Memory Imm(Eb) M[Imm+ R[Eb]] Base+displacement Memory (Eb, Ei) M[R[Eb]+ R[Ei]] Indexed Memory Imm(Eb, Ei) M[Imm+ R[Eb]+ R[Ei]] Scaled indexed Memory (, Ei, s) M[R[Ei]*s] Scaled indexed Memory (Eb, Ei, s) M[R[Eb]+ R[Ei]*s] Scaled indexed Memory Imm(Eb, Ei, s) M[Imm+ R[Eb]+ R[Ei]*s] Scaled indexed 40
Address 0 x 100 Value 0 x. FF 0 x 104 0 x. AB 0 x 108 0 x 13 0 x 10 C 0 x 11 Operand Register %eax %ecx %edx Value 0 x 100 0 x 1 0 x 3 • Practice problem 3. 1 P 138 Value Comment %eax 0 x 100 Register (%eax) 0 x. FF Address 0 x 100 $0 x 108 0 x 13 260(%ecx, %edx) 0 x 13 Address 0 x 108 (%eax, %edx, 4) 0 x 11 Address 0 x 10 C 41 Immediate Absolute address
Data Formats Figure 3. 1 P 135 C declaration Intel data type char short int unsigned long char * float double long double Byte Word Double word Double word Single precision Double precision Extended precision GAS suffix b w l l l s l t Size (byte) 1 2 4 4 4 8 10/12 42
Data Formats • Move data instruction – – mov (general) movb (move byte) movw (move word) movl (move double word) 43
- Lex yacc example
- Cross compiler in compiler design
- Disadvantages of waterfall model in software engineering
- Machine level representation of data
- Sandwich quotes examples
- Compiler phases
- Compiler techniques for exposing ilp
- Borland compiler
- Very busy
- Structure of a compiler
- Trace based collection in compiler design
- Compiler design tutorial
- Organization of compiler
- Induction variable elimination in compiler design
- Ssrange in cobol example
- Compiler
- Active oberon
- Lance compiler
- Struktur compiler
- Applications of syntax-directed translation
- Phoenix compiler
- Compiler
- 3rd generation programming language
- Codeplay compiler
- Transition diagram in compiler design
- Compiler lecture
- Modelon コンパイラ
- Complier adalah:
- Applications of sdd in compiler design
- Compiler control directives in c
- The designer expresses the ideas in terms related to the
- Assembler
- 350142
- History of python
- Source language issues in compiler design
- What is copy propagation
- Gnu gcc is a: cross compiler assembler linker loader
- Trace scheduling
- Reloctab
- Java interpreter
- Introduction to compiler construction
- Constant propagation in compiler design
- Cousins of compiler
- Difference between assembler and compiler