MachineLevel Representation of Programs I 1 Outline Compiler

  • Slides: 43
Download presentation
Machine-Level Representation of Programs I 1

Machine-Level Representation of Programs I 1

Outline • • • Compiler drivers History of the Intel IA-32 architecture Assembly code

Outline • • • Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory and Registers Addressing Mode Data Formats • Suggested reading – Chap 1. 2, 1. 4. 1, 1. 7. 3, 3. 1, 3. 2, 3. 3, 3. 4. 1 2

The Hello Program • It begins life as a high-level C program – Can

The Hello Program • It begins life as a high-level C program – Can be read and understand by human beings • The individual C statements must be translated by compiler drivers – So that the hello program can run on a computer system – Compiler:编译器 3

The Hello Program • The C programs are translated into – A sequence of

The Hello Program • The C programs are translated into – A sequence of low-level machine-language instructions • These instructions are then packaged in a form – called an object program • Object program are stored as a binary disk file – Also referred to as executable object files 4

The Context of a Compiler (gcc) Figure 1. 3 P 5 hello. c Source

The Context of a Compiler (gcc) Figure 1. 3 P 5 hello. c Source program (text) Preprocessor (cpp) hello. i Modified source program (text) Compiler (cc 1) hello. s Assembly program (text) Assembler (as) hello. o Relocatable object program (binary) Linker (ld) Compiler: 编译器 Assembler: 汇编器 Linker: 连接器 hello Executable object program (binary) 5

Characteristics of the high level programming languages • Abstraction – Productive – reliable •

Characteristics of the high level programming languages • Abstraction – Productive – reliable • Type checking • As efficient as hand written code • Can be compiled and executed on a number of different machines, whereas assembly code is highly machine specific Productive:多产的 Reliable: 可靠的 6

Characteristics of the assembly programming languages • Managing memory • Low level instructions to

Characteristics of the assembly programming languages • Managing memory • Low level instructions to carry out the computation • Highly machine specific 7

Why should we understand the assembly code • Understand the optimization capabilities of the

Why should we understand the assembly code • Understand the optimization capabilities of the compiler • Analyze the underlying inefficiencies in the code • Sometimes the run-time behavior of a program is needed 8

From writing assembly code to understand assembly code • Different set of skills –

From writing assembly code to understand assembly code • Different set of skills – Transformations – Relation between source code and assembly code • Reverse engineering – Trying to understand the process by which a system was created • By studying the system and • By working backward Backward: 回溯 9

A Historical Perspective • Long evolutionary development – Started from rather primitive 16 -bit

A Historical Perspective • Long evolutionary development – Started from rather primitive 16 -bit processors – Added more features • Take the advantage of the technology improvements • Satisfy the demands for higher performance and for supporting more advanced operating systems – Laden with features providing backward compatibility that are obsolete * laden with: 承载 * compatibility: 兼容性 * obsolete: 陈旧的 10

X 86 family • 8086(1978, 29 K) – The heart of the IBM PC

X 86 family • 8086(1978, 29 K) – The heart of the IBM PC & DOS – 1 M bytes addressable, 640 K for users • 80286(1982, 134 K) – More (now obsolete) addressing modes – Basis of the IBM PC-AT & Windows 11

X 86 family • i 386(1985, 275 K) – 32 bits architecture, flat addressing

X 86 family • i 386(1985, 275 K) – 32 bits architecture, flat addressing model – Support a Unix operating system • I 486(1989, 1. 9 M) – Integrated the floating-point unit onto the processor chip 12

X 86 family • Pentium(1993, 3. 1 M) • Pentium. Pro(1995, 6. 5 M)

X 86 family • Pentium(1993, 3. 1 M) • Pentium. Pro(1995, 6. 5 M) – P 6 microarchitecture – Conditional mov • Pentium/MMX(1997, 4. 5 M) – New class of instructions for manipulating vectors of integers 13

X 86 family • Pentium II(1997, 7 M) – Implementing MMX instructions within P

X 86 family • Pentium II(1997, 7 M) – Implementing MMX instructions within P 6 • Pentium III(1999, 8. 2 M) – New class of instructions for manipulating vectors of floating-point numbers(SSE, Stream SIMD Extension) 14

X 86 family • Pentium 4(2001, 42 M) – Netburst microarchitecture – 144 new

X 86 family • Pentium 4(2001, 42 M) – Netburst microarchitecture – 144 new SSE 2 instructions 15

X 86 family • Advanced Micro Devices (AMD) – Now are close competitors to

X 86 family • Advanced Micro Devices (AMD) – Now are close competitors to Intel – Developing own extension to 64 -bits 16

X 86 family • Transmeta – In January of 2002, introduced Crucoe. TM processor

X 86 family • Transmeta – In January of 2002, introduced Crucoe. TM processor – Radically different approach to implementation • Translates x 86 code into “Very Long Instruction Word” (VLIW) code • High degree of parallelism – Shooting for low-power market such as lap-top computers 17

Hardware Organization • CPU: Central Processing Unit • ALU: Arithmetic/Logic Unit • PC: Program

Hardware Organization • CPU: Central Processing Unit • ALU: Arithmetic/Logic Unit • PC: Program Counter • USB: Universal Serial Bus Figure 1. 4 P 7 18

Virtual spaces • A linear array of bytes – each with its own unique

Virtual spaces • A linear array of bytes – each with its own unique address (array index) starting at zero 0 xfffffffe contents addresses … … 0 x 2 0 x 1 0 x 0 19

Data layout • Object model in C – Different data types can be declared

Data layout • Object model in C – Different data types can be declared 20

Data layout • Object model in assembly – A large, byte-addressable array – No

Data layout • Object model in assembly – A large, byte-addressable array – No distinctions even between signed or unsigned integers – Code, user data, OS data – Run-time stack for managing procedure call and return – Blocks of memory allocated by user 21

 • Figure 1. 13 22 P 17

• Figure 1. 13 22 P 17

Operations in C constructs • Arithmetic expression evaluation • Loops • Procedure calls and

Operations in C constructs • Arithmetic expression evaluation • Loops • Procedure calls and returns • Translated into sequences of instructions 23

Operations in Assembly Instructions • Performs only a very elementary operation • Normally one

Operations in Assembly Instructions • Performs only a very elementary operation • Normally one by one in sequential • Operate data stored in registers • Transfer data between memory and a register • Conditionally branch to a new instruction address 24

Assembly Programmer’s View Figure 3. 2 P 136 %eax %ah %al %edx %dh %dl

Assembly Programmer’s View Figure 3. 2 P 136 %eax %ah %al %edx %dh %dl %ecx %ch %cl %ebx %bh %bl Addresses Data FF C 0 BF Stack %esi %edi Instructions 80 7 F %esp %ebp %eip %eflag 40 3 F 08 00 Heap DLLs Heap Data Text 25

Programmer-Visible States P 129 • Program Counter(%eip) – Address of the next instruction •

Programmer-Visible States P 129 • Program Counter(%eip) – Address of the next instruction • Register File – Heavily used program data – Integer and floating-point 26

Programmer-Visible States • Conditional code register – Hold status information about the most recently

Programmer-Visible States • Conditional code register – Hold status information about the most recently executed instruction – Implement conditional changes in the control flow 27

Code Examples P 130 C code int sum(int x, int y) { int t

Code Examples P 130 C code int sum(int x, int y) { int t = x+y; return t; } Obtain with command _sum: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax movl %ebp, %esp popl %ebp ret gcc –O 2 -S code. c Assembly file code. s 28

Code Examples P 131 55 89 e 5 8 b 45 0 c 03

Code Examples P 131 55 89 e 5 8 b 45 0 c 03 45 08 01 05 00 00 89 ec 5 d c 3 Obtain with command gcc –O 2 -c code. c Relocatable object file code. o 29

Code Examples Obtain with command objdump -d code. o Disassembly output (P 132 反汇编输出)

Code Examples Obtain with command objdump -d code. o Disassembly output (P 132 反汇编输出) 0 x 80483 b 4 <sum>: 0 x 80483 b 4 55 0 x 80483 b 5 89 e 5 0 x 80483 b 7 8 b 45 0 c 0 x 80483 ba 03 45 08 0 x 80483 bd 01 05 00 00 0 x 80483 c 3 89 ec 0 x 80483 c 5 5 d 0 x 80483 c 6 c 3 push mov add pop ret nop %ebp %esp, %ebp 0 xc(%ebp), %eax 0 x 8(%ebp), %eax %ebp, %esp %eax, 0 x 0 %ebp 30

C Code • Add two signed integers • int t = x+y; 31

C Code • Add two signed integers • int t = x+y; 31

Assembly Code • Operands: – x: – y: – t: Register Memory Register %eax

Assembly Code • Operands: – x: – y: – t: Register Memory Register %eax M[%ebp+8] %eax • Instruction – addl 8(%ebp), %eax – Add 2 4 -byte integers – Similar to expression x +=y • Return function value in %eax 32

Object Code • 3 -byte instruction • Stored at address 0 x 80483 b

Object Code • 3 -byte instruction • Stored at address 0 x 80483 b 7 • 0 x 80483 b 7: 03 45 08 33

Operands P 137 • In high level languages – Either constants (常数) – Or

Operands P 137 • In high level languages – Either constants (常数) – Or variable (变量) – A=A+4 va ri ab le • Example constant 34

Operands • Counterparts in assembly languages – Immediate ( constant ) – Register (

Operands • Counterparts in assembly languages – Immediate ( constant ) – Register ( variable ) – Memory ( variable ) memory • Example movl 8(%ebp), %eax register addl $4, %eax immediate 35

Simple Addressing Mode • Immediate – represents a constant – The format is $imm

Simple Addressing Mode • Immediate – represents a constant – The format is $imm ($4, $0 xffff) • Registers – The fastest storage units in computer systems – Typically 32 -bit long – Register mode Ea • The value stored in the register • Noted as R[Ea] 36

Virtual spaces • A linear array of bytes – each with its own unique

Virtual spaces • A linear array of bytes – each with its own unique address (array index) starting at zero 0 xfffffffe contents addresses … … 0 x 2 0 x 1 0 x 0 37

Memory References • The name of the array is annotated as M • If

Memory References • The name of the array is annotated as M • If addr is a memory address • M[addr] is the content of the memory starting at addr • addr is used as an array index • How many bytes are there in M[addr]? – It depends on the context 38

Memory Addressing Mode • An expression for – a memory address (or an array

Memory Addressing Mode • An expression for – a memory address (or an array index) • Most general form – imm (Eb, Ei, s) – s: 1, 2, 4, 8 • The address represented by the above form – imm + R[Eb] + R[Ei] * s • It gives the value – M[imm + R[Eb] + R[Ei] * s] 39

Addressing Mode Type Form Figure 3. 3 P 137 Operand value Name Immediate $Imm

Addressing Mode Type Form Figure 3. 3 P 137 Operand value Name Immediate $Imm Immediate Register Ea R[Ea] Register Memory Imm M[Imm] Absolute Memory (Ea) M[R[Ea]] Indirect Memory Imm(Eb) M[Imm+ R[Eb]] Base+displacement Memory (Eb, Ei) M[R[Eb]+ R[Ei]] Indexed Memory Imm(Eb, Ei) M[Imm+ R[Eb]+ R[Ei]] Scaled indexed Memory (, Ei, s) M[R[Ei]*s] Scaled indexed Memory (Eb, Ei, s) M[R[Eb]+ R[Ei]*s] Scaled indexed Memory Imm(Eb, Ei, s) M[Imm+ R[Eb]+ R[Ei]*s] Scaled indexed 40

Address 0 x 100 Value 0 x. FF 0 x 104 0 x. AB

Address 0 x 100 Value 0 x. FF 0 x 104 0 x. AB 0 x 108 0 x 13 0 x 10 C 0 x 11 Operand Register %eax %ecx %edx Value 0 x 100 0 x 1 0 x 3 • Practice problem 3. 1 P 138 Value Comment %eax 0 x 100 Register (%eax) 0 x. FF Address 0 x 100 $0 x 108 0 x 13 260(%ecx, %edx) 0 x 13 Address 0 x 108 (%eax, %edx, 4) 0 x 11 Address 0 x 10 C 41 Immediate Absolute address

Data Formats Figure 3. 1 P 135 C declaration Intel data type char short

Data Formats Figure 3. 1 P 135 C declaration Intel data type char short int unsigned long char * float double long double Byte Word Double word Double word Single precision Double precision Extended precision GAS suffix b w l l l s l t Size (byte) 1 2 4 4 4 8 10/12 42

Data Formats • Move data instruction – – mov (general) movb (move byte) movw

Data Formats • Move data instruction – – mov (general) movb (move byte) movw (move word) movl (move double word) 43