Introduction to Computer Organization and Architecture Lecture 5

  • Slides: 45
Download presentation
Introduction to Computer Organization and Architecture Lecture 5 By Juthawut Chantharamalee http: //dusithost. dusit.

Introduction to Computer Organization and Architecture Lecture 5 By Juthawut Chantharamalee http: //dusithost. dusit. ac. th/~juthawut_cha/ home. htm

Outline RISC and CISC Comparison p Instruction Set Examples p n n n ARM

Outline RISC and CISC Comparison p Instruction Set Examples p n n n ARM Freescale 68 K Intel IA-32 Introduction to Computer Organization and Architecture 2

RISC and CISC n Reduced Instruction Set Computer Fixed length instructions p Simpler Instructions

RISC and CISC n Reduced Instruction Set Computer Fixed length instructions p Simpler Instructions p Fewer cycles per instruction p Load/Store memory access p Register operands only p Probably doesn’t have microcode p RISC is a misnomer – may have many instructions p n Complex Instruction Set Computer Variable length instructions p More complex Instructions p More cycles per instruction p May have “orthogonal” instruction set p Memory and register operands p May have microcode p Introduction to Computer Organization and Architecture 3

ARM “Advanced RISC Machines” p www. arm. com p Over 90 ARM processors are

ARM “Advanced RISC Machines” p www. arm. com p Over 90 ARM processors are shipped every second – more than any other 32 -bit processor IP supplier p ARM licenses its technology to more than 200 semiconductor companies. p Eight product families p Introduction to Computer Organization and Architecture 4

ARM Example n n ARM Cortex. TM-A 8 processor Intellectual Property (IP) Core p

ARM Example n n ARM Cortex. TM-A 8 processor Intellectual Property (IP) Core p n n licensed by other companies to create “System On a Chip” (SOC) Dual, symmetric, inorder issue, 13 -stage pipelines Integrated L 2 cache Introduction to Computer Organization and Architecture 5

ARM Register Structure n 15 General Purpose Registers p n By convention 15 General

ARM Register Structure n 15 General Purpose Registers p n By convention 15 General purpose registers 31 0 Program counter R 15 (PC) Current Program Status Register 15 banked registers p R 0 R 14 R 12 frame pointer p R 13 stack pointer n 0 R 14 also link register p n 31 copied/restored when going to/from User/Supervisor 31 30 29 28 7 6 CPSR N - Negative Z - Zero C - Carry V- Overflow 4 0 Status register Processor mode bits Interrupt disable bits Condition code flags Introduction to Computer Organization and Architecture 6

ARM Instruction Format 31 28 27 Condition 20 19 16 15 12 11 OP

ARM Instruction Format 31 28 27 Condition 20 19 16 15 12 11 OP code Rn Rd 4 3 Other info 0 Rm Load/store architecture (RISC) p Conditional execution of instructions p One or two operands (register) p Destination register p See appendix B p Introduction to Computer Organization and Architecture 7

ARM Addressing Modes Name Assemblersyntax Addressing function [R n, #offset] EA = [R n]

ARM Addressing Modes Name Assemblersyntax Addressing function [R n, #offset] EA = [R n] + offset With immediate offset: Pre-indexed with writeback [R n, #offset]! EA = [R n] + offset; Rn [R n] + offset Post-indexed [R n], #offset EA = [R n]; Rn [R n] + offset With of fset magnitude in Rm : Pre-indexed [R n, Rm , shift] EA = [R n] [R m ] shifted Pre-indexed with writeback [R n, Rm , shift]! EA = [R n] [R m ] shifted; Rn [R n] [R m ] shifted Post-indexed [R n], Rm , shift EA = [R n]; Rn [R n] [R m ] shifted Location EA = Location = [PC] + offset Relative (Pre-indexed with immediate offset) where: EA = effective address offset = a signed number contained in the instruction shift = direction #integer, where direction is LSL for left shift or LSR for right shift, and integer is a 5 -bit unsigned number specifying the shift amount +/- Rm = the offset magnitude in register Rm can be added to or subtracted from the contents of base register Rn Introduction to Computer Organization and Architecture 8

ARM Relative Addressing Mode p LDR R 1, ITEM n n n Pre-indexed mode

ARM Relative Addressing Mode p LDR R 1, ITEM n n n Pre-indexed mode with immediate offset PC is base register Calculated offset = 52 p Memory address word (4 bytes) 1000 LDR R 1, ITEM 1004 - 1008 - * * * ITEM = 1060 updated [PC] = 1008 52 = offset Operand PC will be at 1008 when executed Introduction to Computer Organization and Architecture 9

ARM Pre-indexed Mode n STR R 3, [R 5, R 6] Pre-indexed mode p

ARM Pre-indexed Mode n STR R 3, [R 5, R 6] Pre-indexed mode p base register = R 5 p offset register = R 6 p 1000 STR R 3, [R 5, R 6] Base register * * * 200 1000 * * * 1200 R 5 R 6 Offset register * * * 200 = offset Operand Introduction to Computer Organization and Architecture 10

ARM Post-indexed Mode w/ WB LDR R 1, [R 2], R 10, LSL #2

ARM Post-indexed Mode w/ WB LDR R 1, [R 2], R 10, LSL #2 p Use in loop p LSL #2 is logical shift left by 2 bits => x 4 p § pass: R 1 <- [R 2] § 2 nd pass: R 1 <- [[R 2] + [R 10] x 4] R 2 <- [R 2] + [R 10] x 4 § 3 rd pass: R 1 <- [[R 2] + [R 10] x 4] R 2 <- [R 2] + [R 10] x 4 § and so on 1 st Memory address 1000 100 = 25 x 4 1100 word (4 bytes) 6 * * * -17 1000 R 2 Base register 25 R 10 Offset register 100 = 25 x 4 1200 * * * 321 Introduction to Computer Organization and Architecture Load instruction: LDR R 1, [R 2], R 10, LSL #2 11

ARM Pre-indexed Mode w/ WB STR R 0, [R 5, #-4]! p Push instruction

ARM Pre-indexed Mode w/ WB STR R 0, [R 5, #-4]! p Push instruction p R 5 is SP p Immediate offset of -4 is added to [R 5] p TOS = 2008 p 2012 R 5 Base register (Stack pointer) 2008 27 2012 after execution of Push instruction Introduction to Computer Organization and Architecture 27 R 0 Push instruction: STR R 0, [R 5, #-4]! 12

ARM Instructions p All instructions can be executed conditionally n p Most instructions have

ARM Instructions p All instructions can be executed conditionally n p Most instructions have shift and rotate operations directly implemented in them n p b 31 -28 of instruction barrel shifter Load/store multiple instructions n LDMIA R 10!, {R 0, R 1, R 6, R 7} R 0 <- [R 10], R 1 <- [R 10]+4, R 6 <- [R 10]+8, R 7 <- [R 10]+12 p R 10 <- [R 10] + 16 p p Condition code set by “S” suffix Introduction to Computer Organization and Architecture 13

ARM Instructions n Arithmetic Opcode Rd, Rn, Rm p ADD R 0, R 2,

ARM Instructions n Arithmetic Opcode Rd, Rn, Rm p ADD R 0, R 2, R 4 p ADD R 0, R 3, #17 p => R 0 <- [R 2] + [R 4] => R 0 <- [R 3] + 17 § immediate value in b 7 -0 SUB R 0, R 6, R 5 p ADD R 0, R 1, R 5, LSL #4 p MUL R 0, R 1, R 2 p MLA R 0, R 1, R 2, R 3 p ADDS R 0, R 1, R 2 p => R 0 <- [R 6] – [R 5] => R 0 <- R 1+[R 5]x 16 => R 0 <- [R 1] X [R 2] => R 0 <- [R 1]X[R 2]+[R 3] => R 0 <- [R 1] + [R 2] § Sets condition codes NCZV Introduction to Computer Organization and Architecture 14

ARM Instructions n Logic Opcode Rd, Rn, Rm p AND R 0, R 2,

ARM Instructions n Logic Opcode Rd, Rn, Rm p AND R 0, R 2, R 4 p BIC R 0, R 1 p MVN R 0, R 3 p n => R 0 <- [R 2] ^ [R 4] => R 0 <- [R 0] ^ ~[R 1] => R 0 <- ~[R 3] BCD Pack Program LDRB AND ORR STRB R 0, POINTER R 1, [R 0] R 2, [R 0, #1] R 2, #&F R 2, R 1, LSL #4 R 2, PACKED Load address LOC in to R 0. Load ASCI I characters in to R 1 and R 2. Clear high-order 28 bits of R 2. Or [R 1] shifted left in to [R 2]. Store packed BCD digits in to PA CKED. Introduction to Computer Organization and Architecture 15

ARM Instructions n Branch 31 Contain 2’s complement 24 -bit offset p Condition to

ARM Instructions n Branch 31 Contain 2’s complement 24 -bit offset p Condition to be tested is in b 31 -28 p BEQ LOCATION p BGT LOOP 28 27 Condition p 24 23 OP code 0 Offset (a) Instruction format 1000 BEQ LOCATION 1004 updated [PC] = 1008 Offset = 92 LOCATION = 1100 Branch target instruction Introduction to Computer Organization and Architecture 16

ARM Assembly Language Memory address label Operation Addressing or data information Assembler directives AREA

ARM Assembly Language Memory address label Operation Addressing or data information Assembler directives AREA ENTR Y CODE Statements that generate machine instructions LDR MOV LDR ADD SUBS BGT STR R 1, N R 2, POINTER R 0, #0 R 3, [R 2], #4 R 0, R 3 R 1, #1 LOOP R 0, SUM AREA DCD DCD D ATA 0 5 NUM 1 3, 17, 27, 12, 322 LOOP Assembler directives SUM N POINTER NUM 1 Introduction to Computer Organization and Architecture 17

ARM Subroutines n Example 1 Parameters passed through registers p Branch and Link instruction

ARM Subroutines n Example 1 Parameters passed through registers p Branch and Link instruction (BL) Calling program LDR BL STR. . . R 1, N R 2, POINTER LIST ADD R 0, SUM STMFD R 13!, { R 3, R 14 } MO V LDR ADD SUBS BGT LDMFD R 0, #0 R 3, [R 2], #4 R 0, R 3 R 1, #1 LOOP R 13!, { R 3, R 15 } Subroutine LIST ADD LOOP Save R 3 and return address in R 14 on stack, using R 13 as the stack pointer. Restore R 3 and load return address in to PC (R 15). Introduction to Computer Organization and Architecture 18

ARM Subroutines n Example 2 Parameters passed on stack (Assume top of stack is

ARM Subroutines n Example 2 Parameters passed on stack (Assume top of stack is at level 1 below. ) Calling program LDR STR BL LDR STR ADD. . . R 0, POINTER R 0, [R 13, # – 4]! R 0, N R 0, [R 13, # – 4]! LIST ADD R 0, [R 13, #4] R 0, SUM R 13, #8 Push NUM 1 on stack. Push n on stack. LOOP [R 2] [R 3] Return Address Level 2 STMFD LDR MO V LDR ADD SUBS BGT STR LDMFD [R 0] [R 1] Move the sum in to memory location SUM. Remove parameters from stack. Subroutine LIST ADD Level 3 R 13!, { R 0 – R 3, R 14} R 1, [R 13, #20] R 2, [R 13, #24] R 0, #0 R 3, [R 2], #4 R 0, R 3 R 1, #1 LOOP R 0, [R 13, #24] R 13!, { R 0 – R 3, R 15} Save registers. Load parameters from stack. n NUM 1 Level 1 Place sum on stack. Restore registers and return. Introduction to Computer Organization and Architecture 19

ARM Program Example p Byte sorting program n for C program n (j =

ARM Program Example p Byte sorting program n for C program n (j = n – 1; j > 0; j = j – 1) { for ( k = j – 1; k > = 0; k = k – 1 ) { if (LIST[ k ] > LIST[ j] ) { TEMP = LIST[ k]; LIST[ k] = LIST[ j ]; LIST[ j ] = TEMP; } } } OUTER INNER Assembly program ADR LDR ADD LDRB MOV R 4, LIST R 10, N R 2, R 4, R 10 R 5, R 4, #1 R 0, [R 2, # – 1]! R 3, R 2 LDRB CMP STRGTB MOVGT CMP BNE R 1, [R 3, # – 1]! R 1, R 0 R 1, [R 2] R 0, [R 3] R 0, R 1 R 3, R 4 INNER R 2, R 5 OUTER Introduction to Computer Organization and Architecture Load list pointer register R 4, and initialize outer loop base register R 2 to LIST + n. Load LIST + 1 into R 5. Load LIST( j ) into R 0. Initialize inner loop base register R 3 to LIST + n – 1. Load LIST( k) into R 1. Compare LIST( k ) to LIST( j ). If LIST( k) > LIST( j ), swap LIST( k) and LIST( j ), and move (new) LIST( j ) into R 0. If k > 0, repeat inner loop. If j > 1, repeat outer loop. 20

Freescale 68 K p Freescale Semiconductor n formerly Motorola Semiconductor www. freescale. com p

Freescale 68 K p Freescale Semiconductor n formerly Motorola Semiconductor www. freescale. com p There are more than 17 billion Freescale semiconductors at work all over the planet. p n p Automobiles, computer networks, communications infrastructure, office buildings, factories, industrial equipment, tools, mobile phones, home appliances and consumer products About 20 microprocessor families Introduction to Computer Organization and Architecture 21

68 K n 68 K Family 68000: Introduced in 1979, 16 bit word length

68 K n 68 K Family 68000: Introduced in 1979, 16 bit word length and 8/16/32 bit arithmetic, 24 bit address space (16 MB) p 68008: 8 bit version of the 68000 with 20 bit address space p 68010: Version of the 68000 supporting virtual memory and virtual machine concepts p 68020: Extended addressing capabilities, 32 -bit, i-cache p 68030: Data cache in addition to the instruction cache, onchip memory management unit p 68040: Floating-point arithmetic, pipelining, . . . p “Cold. Fire” family added in 1994 p § V 1 through V 5 cores Introduction to Computer Organization and Architecture 22

68 K Example n Cold. Fire V 5 Core Introduction to Computer Organization and

68 K Example n Cold. Fire V 5 Core Introduction to Computer Organization and Architecture 23

68 K Register Structure Long word n n n 8 32 -bit Data Registers

68 K Register Structure Long word n n n 8 32 -bit Data Registers 8 32 -bit Address Registers A 7 is Stack Pointer p n n Separate Supervisor and User pointers Users cannot execute privileged instructions Status Register D 0 31 16 15 Word 8 7 Byte 0 D 1 D 2 D 3 Data registers D 4 D 5 D 6 D 7 A 0 A 1 A 2 Address registers A 3 A 4 A 5 A 6 A 7 User stack pointer Supervisor stack pointer Stack pointers PC Program counter 15 13 10 8 SR T - Trace mode select S - Supervisor mode select I - Interrupt mask Introduction to Computer Organization and Architecture 4 0 Status register CVZNX- Carry Overflow Zero Negative Extend 24

68 K Instruction Format 15 12 11 9 8 7 6 5 1 1

68 K Instruction Format 15 12 11 9 8 7 6 5 1 1 01 dst 0 0 src size OP code Three operand sizes: Byte, Word, Long Word p All addressing modes supported (CISC) p One or two operands p See appendix C p Introduction to Computer Organization and Architecture 25

68 K Addressing Modes Name Assembler syntax Addressing function Immediate #Value Operand = Value

68 K Addressing Modes Name Assembler syntax Addressing function Immediate #Value Operand = Value Absolute Short Value EA = Sign Extended WV alue Absolute Long Value EA = Value Register Rn EA = R n that is, Operand = [R n ] Register Indirect (An) EA = [A n ] Autoincremen t (An)+ EA = [A n ]; Incremen t A n Auto decrement – (An) Decrement A n ; EA = [A n ] Indexed basic WV alue(An) EA = WV alue + [A n ] Indexed full BV alue(An, Rk. S) EA = BV alue + [A n ] +[R k ] Relative basic WV alue(PC) or Label EA = WV alue + [PC] Relative full BV alue(PC, Rk. S) or Label (Rk) EA = BV alue + [PC] + [R k ] where: EA = effective address Value = a number given either explicitly or represented by a label BValue = an 8 -bit Value WValue = a 16 -bit Value An = an address register Rn = an address or a data register S = a size indicator Introduction to Computer Organization and Architecture 26

68 K Instructions n Format – see appendix C Opcode src, dst p Opcode

68 K Instructions n Format – see appendix C Opcode src, dst p Opcode src p n Arithmetic examples ABCD, ADDA, ADDI, ADDQ, ADDX p DIVS, DIVU, MULS, MULU p SBCD, SUBA, SUBI, SUBQ, p n Logic examples AND, ANDI, EORI p NBCD, NEGX, NOP, NOT, p OR, ORI, SWAP p Introduction to Computer Organization and Architecture 27

68 K Instructions n Shift examples ASL, ASR, BCHG, EXT, LSL, LSR p ROL,

68 K Instructions n Shift examples ASL, ASR, BCHG, EXT, LSL, LSR p ROL, ROR, ROXL, p n Bit test and compare BCLR, BSET, BTST, TAS, TST p CMP, CMPA, CMPI, CMPMEXG p n Branch examples p n JMP, JSR, RESET, RTE, RTR, RTS, STOP, TRAPV Memory load and store examples LEA, PEA, LINK, UNLINK p MOVE, MOVEA, MOVEM, MOVEP, MOVEQ p Introduction to Computer Organization and Architecture 28

68 K Assembly Language LOOP MOVE. L SUBQ. L MOVEA. L CLR. L ADD.

68 K Assembly Language LOOP MOVE. L SUBQ. L MOVEA. L CLR. L ADD. W DBRA MOVE. L Move Clear Add Decrement Branch>0 Move N, R 1 #NUM 1, R 2 R 0 (R 2)+, R 0 R 1 LOOP R 0, SUM N, D 1 #1, D 1 #NUM 1, A 2 D 0 (A 2)+, D 0 D 1, LOOP D 0, SUM Initialization Put n – 1 into the counter register D 1 Loopback until [D 1]= – 1. Introduction to Computer Organization and Architecture 29

68 K Subroutines Calling program MOVE. L BSR MOVE. L ADDI. L. . .

68 K Subroutines Calling program MOVE. L BSR MOVE. L ADDI. L. . . #NUM 1, – (A 7) N, – (A 7) LISTADD 4(A 7), SUM #8, A 7 Push parameters onto stack. Save result. Restoretop of stack. Level 3 [D 1] [A 2] Subroutine LISTADD LOOP [D 0] MOVEM. L MOVE. L SUBQ. L MOVEA. L CLR. L ADD. W DBRA MOVE. L MOVEM. L RTS D 0 – D 1/A 2, – (A 7) 16(A 7), D 1 #1, D 1 20(A 7), A 2 D 0 (A 2)+, D 0 D 1, LOOP D 0, 20(A 7)+, D 0 – D 1/A 2 Saveregisters D 0, D 1, and A 2. Initialize counter to n. Adjust count to use. DBRA. Initialize pointer to the list. Initialize sum to 0. Add entry from list. Level 2 Return address n NUM 1 Level 1 Put result on the stack. Restoreregisters. Introduction to Computer Organization and Architecture 30

68 K Program Example p Byte sorting program n for C program n (j

68 K Program Example p Byte sorting program n for C program n (j = n – 1; j > 0; j = j – 1) { for ( k = j – 1; k > = 0; k = k – 1 ) { if (LIST[ k ] > LIST[ j] ) { TEMP = LIST[ k]; OUTER LIST[ k] = LIST[ j ]; LIST[ j ] = TEMP; } INNER } } NEXT MOVEA. L MOVE SUBQ MOVE. B CMP. B BLE MOVE. B DBRA SUBQ BGT Assembly program #LIST, A 1 N, D 1 #1, D 2 #1, D 2 (A 1, D 1), D 3, (A 1, D 2) NEXT (A 1, D 2), (A 1, D 1) D 3, (A 1, D 2) (A 1, D 1), D 3 D 2, INNER #1, D 1 OUTER Introduction to Computer Organization and Architecture Pointer to the start of the list. Initialize outer loop index j in D 1. Initialize inner loop index k in D 2. Current maximum value in D 3. If LIST( k) [D 3], do not exchange. Interchange LIST(k) and LIST( j) and load new maximum into D 3. Decrement counters k and j and branch back if not finished. 31

IA-32 n n n Intel Corporation www. intel. com developer. intel. com Microprocessor used

IA-32 n n n Intel Corporation www. intel. com developer. intel. com Microprocessor used in PCs and Apple computers Processor Families Desktop processors p Server and workstation processors p Internet device processors p Notebook processors p Embedded and communications processors p Introduction to Computer Organization and Architecture 32

IA-32 p Intel microprocessor history Introduction to Computer Organization and Architecture 33

IA-32 p Intel microprocessor history Introduction to Computer Organization and Architecture 33

IA-32 Example n P 6 Microarchitecture Introduction to Computer Organization and Architecture 34

IA-32 Example n P 6 Microarchitecture Introduction to Computer Organization and Architecture 34

IA-32 Example p The centerpiece of the P 6 processor microarchitecture is an out-of-order

IA-32 Example p The centerpiece of the P 6 processor microarchitecture is an out-of-order execution mechanism called dynamic execution. Dynamic execution incorporates three data processing concepts: p p p Deep branch prediction allows the processor to decode instructions beyond branches to keep the instruction pipeline full. Dynamic data flow analysis requires real-time analysis of the flow of data through the processor to determine dependencies and to detect opportunities for out-of-order instruction execution. Speculative execution refers to the processor’s ability to execute instructions that lie beyond a conditional branch that has not yet been resolved, and ultimately to commit the results in the order of the original instruction stream. Introduction to Computer Organization and Architecture 35

IA-32 Register Structure n n n 31 8 32 -bit Data Registers 8 64

IA-32 Register Structure n n n 31 8 32 -bit Data Registers 8 64 -bit Floating Point Registers 6 Segment Registers 0 R 1 8 General purpose registers R 7 63 0 FP 1 8 Floating-point registers FP 7 16 Code Segment Stack Segment Data Segments CS SS DS ES FS GS Introduction to Computer Organization and Architecture 0 6 Segment registers 36

IA-32 Register Structure n n 32 -bit Instruction pointer Status register Privilege level p

IA-32 Register Structure n n 32 -bit Instruction pointer Status register Privilege level p Condition codes p 31 0 Instruction pointer 31 13 12 11 9 8 7 6 0 Status register IOPL - Input/Output privilege level OF - Overflow IF - Interrupt enable CF - Carry ZF - Zero SF - Sign TF - Trap Introduction to Computer Organization and Architecture 37

IA-32 Instruction Format 1 to 4 bytes 1 or 2 bytes 1 byte 1

IA-32 Instruction Format 1 to 4 bytes 1 or 2 bytes 1 byte 1 or 4 bytes Prefix OP code Mod. R/M SIB Displacement Immediate Addressing mode Variable instruction length (CISC) p See appendix D p Introduction to Computer Organization and Architecture 38

IA-32 Addressing Modes Name Assembler syntax Addressing function Immediate Value Operand= Value Direct Location

IA-32 Addressing Modes Name Assembler syntax Addressing function Immediate Value Operand= Value Direct Location EA = Location Register Reg EA = Reg thatis, Operand = [Reg] Register indirect [Reg] EA = [Reg] Basewith displacement [Reg+ Disp] EA = [Reg] + Disp Index with displacement [Reg* S + Disp] EA = [Reg] S + Disp Basewith index [Reg 1+ Reg 2 * S] EA = [Reg 1]+ [Reg 2] S Basewith index [Reg 1+ Reg 2 * S + Disp] and displacement EA = [Reg 1]+ [Reg 2] S + Disp Introduction to Computer Organization and Architecture where: Value = an 8 - or 32 -bit signed number Location = a 32 -bit address Reg, Reg 1, Reg 2 = one of the general purpose registers EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI, with the exception that ESP cannot be used as an index register Disp = an 8 - or 32 -bit signed number, except that in the Index with displacement mode it can only be 32 bits S = scale factor of 1, 2, 4, or 8 39

IA-32 Instructions n Arithmetic examples ADC, ADD, CMC, DEC, p DIV, IMUL, MUL p

IA-32 Instructions n Arithmetic examples ADC, ADD, CMC, DEC, p DIV, IMUL, MUL p SBB, SUB p n Logic examples AND, CLC, STC p NEG, NOP, NOT, OR, XOR p Introduction to Computer Organization and Architecture 40

IA-32 Instructions n Shift examples p n Bit test and compare p n RCL,

IA-32 Instructions n Shift examples p n Bit test and compare p n RCL, RCR, ROL, ROR, SAL, SAR, SHL, SHR BT, BTC, BTR, BTS, CMP, TEST Branch examples CALL, RET, CLI, STI, HLT, INT, IRET p LOOP, LOOPE, p n Memory/IO load and store examples LEA, MOVSX, MOVZX p IN, OUT, POPAD, PUSHAD p XCHG p Introduction to Computer Organization and Architecture 41

IA-32 Assembly Language Assembler directives . data NUM 1 N SUM DD DD DD

IA-32 Assembly Language Assembler directives . data NUM 1 N SUM DD DD DD 17, 3, 51, 242, 113 5 0 LEA SUB MOV ADD LOOP MOV EBX , NUM 1 EBX , 4 ECX , N EAX , 0 EAX , [EBX +ECX * 4] STARTADD SUM , EAX END MAIN . code MAIN : Statements that generate machine instructions Assembler directives STARTADD : Introduction to Computer Organization and Architecture 42

IA-32 Subroutines Calling program PUSH CALL ADD POP. . . OFFSET NUM 1 N

IA-32 Subroutines Calling program PUSH CALL ADD POP. . . OFFSET NUM 1 N LIST ADD ESP, 4 SUM PUSH MO V ADD INC DEC JG MO V POP POP RET EDI, 0 EAX, 0 EBX, [ESP+20] ECX, [ESP+20] EAX, [EBX+EDI ECX START ADD [ESP+24], EAX ECX EBX EAX EDI Push parameters on to the stack. Branch to the subroutine. Remove n from the stack. Pop the sum in to SUM. Level 3 Subroutine LIST ADD: STAR TADD: [ECX] [EBX] * 4] Save EDI and use as index register. Save EAX and use as accummulator register. Save EBX and load address NUM 1. Save ECX and load count n. Add next number. Incremen t index. Decremen t coun ter. Branch back if not done. Overwrite NUM 1 in stack with sum. Restore registers. [EAX] [EDI] Level 2 Return Address n NUM 1 Level 1 Return. Introduction to Computer Organization and Architecture 43

IA-32 Program Example p Byte sorting program n for C program n (j =

IA-32 Program Example p Byte sorting program n for C program n (j = n – 1; j > 0; j = j – 1) LEA { for ( k = j – 1; k > = 0; k = k – 1 ) MOV { if (LIST[ k ] > LIST[ j] ) DEC { TEMP = LIST[ k]; LIST[ k] = LIST[ j ]; OUTER: MOV DEC LIST[ j ] = TEMP; } MOV } INNER: CMP JLE } NEXT: Assembly program EAX, LIST EDI, N EDI ECX, EDI ECX DL, [EAX + EDI] [EAX + ECX], DL NEXT XCHG [EAX + ECX], DL MOV DEC JGE DEC JG [EAX + EDI], DL ECX INNER EDI OUTER Introduction to Computer Organization and Architecture Load list pointer base register (EAX), and initialize outer loop index register (EDI) to j = n – 1. Initialize inner loop index register (ECX) to k = j – 1. Load LIST(j) into register DL. Compare. LIST(k) to LIST(j). If LIST(k) LIST(j), go to next lower k index entry; Otherwise, interchange LIST(k) and LIST(j), leaving new LIST(j) in DL. Decrement inner loop index k. Repeat or terminate inner loop. Decrement outer loop index j. Repeat or terminate outer loop. 44

The End Lecture 5

The End Lecture 5