CPE 631 Lecture 08 Virtual Memory Aleksandar Milenkovi

  • Slides: 63
Download presentation
CPE 631 Lecture 08: Virtual Memory Aleksandar Milenković, milenka@ece. uah. edu Electrical and Computer

CPE 631 Lecture 08: Virtual Memory Aleksandar Milenković, milenka@ece. uah. edu Electrical and Computer Engineering University of Alabama in Huntsville

CPE 631 AM Virtual Memory: Topics Why virtual memory? n Virtual to physical address

CPE 631 AM Virtual Memory: Topics Why virtual memory? n Virtual to physical address translation n Page Table n Translation Lookaside Buffer (TLB) n 2/23/2021 UAH-CPE 631 2

CPE 631 AM Another View of Memory Hierarchy Upper Level Thus far { Next:

CPE 631 AM Another View of Memory Hierarchy Upper Level Thus far { Next: Virtual Memory { Regs Instructions, Operands Cache Blocks L 2 Cache Blocks Memory Pages Disk Files Tape 2/23/2021 Faster UAH-CPE 631 Larger Lower Level 3

CPE 631 AM Why Virtual Memory? Today computers run multiple processes, each with its

CPE 631 AM Why Virtual Memory? Today computers run multiple processes, each with its own address space n Too expensive to dedicate a full-addressspace worth of memory for each process n Principle of Locality n – allows caches to offer speed of cache memory with size of DRAM memory – DRAM can act as a “cache” for secondary storage (disk) Virtual Memory n Virtual memory – divides physical memory into blocks and allocate them to different processes 2/23/2021 UAH-CPE 631 4

CPE 631 AM Virtual Memory Motivation Historically virtual memory was invented when programs became

CPE 631 AM Virtual Memory Motivation Historically virtual memory was invented when programs became too large for physical memory n Allows OS to share memory and protect programs from each other (main reason today) n Provides illusion of very large memory n – sum of the memory of many jobs greater than physical memory – allows each job to exceed the size of physical mem. Allows available physical memory to be very well utilized n Exploits memory hierarchy to keep average access time low n 2/23/2021 UAH-CPE 631 5

CPE 631 AM Mapping Virtual to Physical Memory Program with 4 pages (A, B,

CPE 631 AM Mapping Virtual to Physical Memory Program with 4 pages (A, B, C, D) n Any chunk of Virtual Memory assigned to any chuck of Physical Memory (“page”) n Virtual Memory 0 4 KB 8 KB 12 KB Physical Memory 0 4 KB A B C D Disk 8 KB D B 12 KB 16 KB A 20 KB C 24 KB 28 KB 2/23/2021 UAH-CPE 631 6

CPE 631 AM Virtual Memory Terminology n Virtual Address – address used by the

CPE 631 AM Virtual Memory Terminology n Virtual Address – address used by the programmer; CPU produces virtual addresses n Virtual Address Space – collection of such addresses n Memory (Physical or Real) Address – address of word in physical memory n Memory mapping or address translation – process of virtual to physical address translation n More on terminology – Page or Segment Block – Page Fault or Address Fault Miss 2/23/2021 UAH-CPE 631 7

CPE 631 AM Comparing the 2 levels of hierarchy Parameter L 1 Cache Virtual

CPE 631 AM Comparing the 2 levels of hierarchy Parameter L 1 Cache Virtual Memory Block/Page 16 B – 128 B 4 KB – 64 KB Hit time 1 – 3 cc 50 – 150 cc Miss Penalty (Access time) (Transfer time) 8 – 150 cc 6 – 130 cc 2 – 20 cc 1 M – 10 M cc (Page Fault ) 800 K – 8 M cc 200 K – 2 M cc Miss Rate 0. 1 – 10% 0. 00001 – 0. 001% Placement: DM or N-way SA Fully associative (OS allows pages to be placed anywhere in main memory) Address Mapping 25 -45 bit physical address to 14 -20 bit cache address 32 -64 bit virtual address to 25 -45 bit physical address Replacement: LRU or Random (HW cntr. ) LRU (SW controlled) Write Policy WB or WT WB 2/23/2021 UAH-CPE 631 8

CPE 631 AM Paging vs. Segmentation n Two classes of virtual memory – Pages

CPE 631 AM Paging vs. Segmentation n Two classes of virtual memory – Pages - fixed size blocks (4 KB – 64 KB) – Segments - variable size blocks (1 B – 64 KB/4 GB) – Hybrid approach: Paged segments – a segment is an integral number of pages Code Data Paging Segmentation 2/23/2021 UAH-CPE 631 9

CPE 631 AM Paging vs. Segmentation: Pros and Cons Page Segment Words per address

CPE 631 AM Paging vs. Segmentation: Pros and Cons Page Segment Words per address One Two (segment + offset) Programmer visible? Invisible to AP May be visible to AP Replacing a block Trivial (all blocks are the same size) Hard (must find contiguous, variable-size unused portion Memory use inefficiency Internal fragmentation External fragmentation (unused portion of page) pieces of main memory) Efficient disk traffic Yes (adjust page size to Not always (small segments balance access time and transfer few bytes) transfer time) 2/23/2021 UAH-CPE 631 10

CPE 631 AM Virtual to Physical Addr. Translation Program operates in its virtual address

CPE 631 AM Virtual to Physical Addr. Translation Program operates in its virtual address space virtual address (inst. fetch load, store) physical HW mapping address (inst. fetch load, store) Physical memory (incl. caches) Each program operates in its own virtual address space n Each is protected from the other n OS can decide where each goes in memory n Combination of HW + SW provides virtual physical mapping n 2/23/2021 UAH-CPE 631 11

CPE 631 AM Virtual Memory Mapping Function Virtual Address . . . 31 10

CPE 631 AM Virtual Memory Mapping Function Virtual Address . . . 31 10 9 Virtual Page No. 0 . . . Offset translation Physical Address 29 . . . 10 9 . . . 0 Phys. Page No. Offset Use table lookup (“Page Table”) for mappings: Virtual Page number is index n Virtual Memory Mapping Function n – Physical Offset = Virtual Offset – Physical Page Number (P. P. N. or “Page frame”) = Page. Table[Virtual Page Number] 2/23/2021 UAH-CPE 631 12

CPE 631 AM Address Mapping: Page Table Virtual Address: virtual page no. offset Page

CPE 631 AM Address Mapping: Page Table Virtual Address: virtual page no. offset Page Table Valid Page Table Base Reg index into Page Table Access Rights Physical Page Number . . . physical page no. offset Physical Address 2/23/2021 UAH-CPE 631 13

CPE 631 AM Page Table n A page table is an operating system structure

CPE 631 AM Page Table n A page table is an operating system structure which contains the mapping of virtual addresses to physical locations – There are several different ways, all up to the operating system, to keep this data around n Each process running in the operating system has its own page table – “State” of process is PC, all registers, plus page table – OS changes page tables by changing contents of Page Table Base Register 2/23/2021 UAH-CPE 631 14

CPE 631 AM Page Table Entry (PTE) Format n Valid bit indicates if page

CPE 631 AM Page Table Entry (PTE) Format n Valid bit indicates if page is in memory – OS maps to disk if Not Valid (V = 0) n Contains mappings for every possible virtual page Page Table n V. A. R. P. P. T. Valid Access Rights Physical Page Number V. A. R. P. P. T . . P. T. E. If valid, also check if have permission to use page: Access Rights (A. R. ) may be Read Only, Read/Write, Executable 2/23/2021 UAH-CPE 631 15

CPE 631 AM Virtual Memory Problem #1 n Not enough physical memory! – Only,

CPE 631 AM Virtual Memory Problem #1 n Not enough physical memory! – Only, say, 64 MB of physical memory – N processes, each 4 GB of virtual memory! – Could have 1 K virtual pages/physical page! n Spatial Locality to the rescue – Each page is 4 KB, lots of nearby references – No matter how big program is, at any time only accessing a few pages – “Working Set”: recently used pages 2/23/2021 UAH-CPE 631 16

CPE 631 AM VM Problem #2: Fast Address Translation PTs are stored in main

CPE 631 AM VM Problem #2: Fast Address Translation PTs are stored in main memory Every memory access logically takes at least twice as long, one access to obtain physical address and second access to get the data n Observation: locality in pages of data, must be locality in virtual addresses of those pages Remember the last translation(s) n Address translations are kept in a special cache called Translation Look-Aside Buffer or TLB n TLB must be on chip; its access time is comparable to cache n 2/23/2021 UAH-CPE 631 17

CPE 631 AM Typical TLB Format Virtual Addr. n n n Physical Addr. Dirty

CPE 631 AM Typical TLB Format Virtual Addr. n n n Physical Addr. Dirty Ref Valid Access Rights Tag: Portion of virtual address Data: Physical Page number Dirty: since use write back, need to know whether or not to write page to disk when replaced Ref: Used to help calculate LRU on replacement Valid: Entry is valid Access rights: R (read permission), W (write perm. ) 2/23/2021 UAH-CPE 631 18

CPE 631 AM Translation Look-Aside Buffers TLBs usually small, typically 128 - 256 entries

CPE 631 AM Translation Look-Aside Buffers TLBs usually small, typically 128 - 256 entries n Like any other cache, the TLB can be fully associative, set associative, or direct mapped n Processor VA hit PA TLB Lookup miss Cache miss Main Memory hit Data Translation 2/23/2021 UAH-CPE 631 19

CPE 631 AM TLB Translation Steps n n n Assume 32 entries, fully-associative TLB

CPE 631 AM TLB Translation Steps n n n Assume 32 entries, fully-associative TLB (Alpha AXP 21064) 1: Processor sends the virtual address to all tags 2: If there is a hit (there is an entry in TLB with that Virtual Page number and valid bit is 1) and there is no access violation, then 3: Matching tag sends the corresponding Physical Page number 4: Combine Physical Page number and Page Offset to get full physical address 2/23/2021 UAH-CPE 631 20

CPE 631 AM What if not in TLB? Option 1: Hardware checks page table

CPE 631 AM What if not in TLB? Option 1: Hardware checks page table and loads new Page Table Entry into TLB n Option 2: Hardware traps to OS, up to OS to decide what to do n – When in the operating system, we don't do translation (turn off virtual memory) – The operating system knows which program caused the TLB fault, page fault, and knows what the virtual address desired was requested – So it looks the data up in the page table – If the data is in memory, simply add the entry to the TLB, evicting an old entry from the TLB 2/23/2021 UAH-CPE 631 21

CPE 631 AM What if the data is on disk? n We load the

CPE 631 AM What if the data is on disk? n We load the page off the disk into a free block of memory, using a DMA transfer – Meantime we switch to some other process waiting to be run n When the DMA is complete, we get an interrupt and update the process's page table – So when we switch back to the task, the desired data will be in memory 2/23/2021 UAH-CPE 631 22

CPE 631 AM What if we don't have enough memory? n We chose some

CPE 631 AM What if we don't have enough memory? n We chose some other page belonging to a program and transfer it onto the disk if it is dirty – If clean (other copy is up-to-date), just overwrite that data in memory – We chose the page to evict based on replacement policy (e. g. , LRU) n And update that program's page table to reflect the fact that its memory moved somewhere else 2/23/2021 UAH-CPE 631 23

CPE 631 AM Page Replacement Algorithms n First-In/First Out – in response to page

CPE 631 AM Page Replacement Algorithms n First-In/First Out – in response to page fault, replace the page that has been in memory for the longest period of time – does not make use of the principle of locality: an old but frequently used page could be replaced – easy to implement (OS maintains history thread through page table entries) – usually exhibits the worst behavior n Least Recently Used – selects the least recently used page for replacement – requires knowledge of past references – more difficult to implement, good performance 2/23/2021 UAH-CPE 631 24

CPE 631 AM Page Replacement Algorithms (cont’d) n Not Recently Used (an estimation of

CPE 631 AM Page Replacement Algorithms (cont’d) n Not Recently Used (an estimation of LRU) – A reference bit flag is associated to each page table entry such that • Ref flag = 1 - if page has been referenced in recent past • Ref flag = 0 - otherwise – If replacement is necessary, choose any page frame such that its reference bit is 0 – OS periodically clears the reference bits – Reference bit is set whenever a page is accessed 2/23/2021 UAH-CPE 631 25

CPE 631 AM Selecting a Page Size Balance forces in favor of larger pages

CPE 631 AM Selecting a Page Size Balance forces in favor of larger pages versus those in favoring smaller pages n Larger page n – Reduce size PT (save space) – Larger caches with fast hits – More efficient transfer from the disk or possibly over the networks – Less TLB entries or less TLB misses n Smaller page – better conserve space, less wasted storage (Internal Fragmentation) – shorten startup time, especially with plenty of small processes 2/23/2021 UAH-CPE 631 26

CPE 631 AM VM Problem #3: Page Table too big! n Example – 4

CPE 631 AM VM Problem #3: Page Table too big! n Example – 4 GB Virtual Memory ÷ 4 KB page => ~ 1 million Page Table Entries => 4 MB just for Page Table for 1 process, 25 processes => 100 MB for Page Tables! Problem gets worse on modern 64 -bits machines n Solution is Hierarchical Page Table n 2/23/2021 UAH-CPE 631 27

CPE 631 AM Page Table Shrink n Single Page Table Virtual Address Page Number

CPE 631 AM Page Table Shrink n Single Page Table Virtual Address Page Number 20 bits n 12 bits Multilevel Page Table Virtual Address Super Page Number 10 bits n Offset Page Number 10 bits Offset 12 bits Only have second level page table for valid entries of super level page table – If only 10% of entries of Super Page Table are valid, then total mapping size is roughly 1/10 -th of single level page table 2/23/2021 UAH-CPE 631 28

CPE 631 AM 2 -level Page Table 2 nd Level Page Tables Physical Memory

CPE 631 AM 2 -level Page Table 2 nd Level Page Tables Physical Memory Virtual Memory Super Page. Table Stack 64 MB Heap . . . Static Code 0 2/23/2021 UAH-CPE 631 29

CPE 631 AM The Big Picture Virtual address TLB access No TLB hit? No

CPE 631 AM The Big Picture Virtual address TLB access No TLB hit? No try to read from PT Yes page fault? Cache hit? TLB miss stall cache miss stall 2/23/2021 Write? try to read from cache No No replace page from disk Yes Set in TLB Yes cache/buffer mem. write Deliver data to CPU UAH-CPE 631 30

CPE 631 AM The Big Picture (cont’d) L 1 -8 K, L 2 -4

CPE 631 AM The Big Picture (cont’d) L 1 -8 K, L 2 -4 M, Page-8 K, cl-64 B, VA-64 b, PA-41 b 28 ? 2/23/2021 UAH-CPE 631 31

CPE 631 AM Things to Remember Apply Principle of Locality Recursively n Manage memory

CPE 631 AM Things to Remember Apply Principle of Locality Recursively n Manage memory to disk? Treat as cache n – Included protection as bonus, now critical – Use Page Table of mappings vs. tag/data in cache Spatial locality means Working Set of pages is all that must be in memory for process to run n Virtual memory to Physical Memory Translation too slow? n – Add a cache of Virtual to Physical Address Translations, called a TLB n Need more compact representation to reduce memory size cost of simple 1 -level page table (especially 32 64 -bit address) 2/23/2021 UAH-CPE 631 32

Instruction Set Principles and Examples

Instruction Set Principles and Examples

CPE 631 AM Outline What is Instruction Set Architecture? n Classifying ISA n Elements

CPE 631 AM Outline What is Instruction Set Architecture? n Classifying ISA n Elements of ISA n – – – n Programming Registers Type and Size of Operands Addressing Modes Types of Operations Instruction Encoding Role of Compilers 2/23/2021 UAH-CPE 631 34

CPE 631 AM Shift in Applications Area Desktop Computing – emphasizes performance of programs

CPE 631 AM Shift in Applications Area Desktop Computing – emphasizes performance of programs with integer and floating point data types; little regard for program size or processor power n Servers - used primarily for database, file server, and web applications; FP performance is much less important for performance than integers and strings n Embedded applications value cost and power, so code size is important because less memory is both cheaper and lower power n DSPs and media processors, which can be used in embedded applications, emphasize real-time performance and often deal with infinite, continuous streams of data n – Architects of these machines traditionally identify a small number of key kernels that are critical to success, and hence are often supplied by the manufacturer. 2/23/2021 UAH-CPE 631 35

CPE 631 AM What is ISA? Instruction Set Architecture – the computer visible to

CPE 631 AM What is ISA? Instruction Set Architecture – the computer visible to the assembler language programmer or compiler writer n ISA includes n – – – Programming Registers Operand Access Type and Size of Operands Instruction Set Addressing Modes Instruction Encoding 2/23/2021 UAH-CPE 631 36

CPE 631 AM Classifying ISA Stack Architectures operands are implicitly on the top of

CPE 631 AM Classifying ISA Stack Architectures operands are implicitly on the top of the stack n Accumulator Architectures one operand is implicitly accumulator n General-Purpose Register Architectures only explicit operands, either registers or memory locations n – register-memory: access memory as part of any instruction – register-register: access memory only with load and store instructions 2/23/2021 UAH-CPE 631 37

CPE 631 AM Classifying ISA (cont’d) For classes: Stack, Accumulator, Register-Memory, Load-store (or Register-Register)

CPE 631 AM Classifying ISA (cont’d) For classes: Stack, Accumulator, Register-Memory, Load-store (or Register-Register) TOS Stack Processor Accumulator Processor . . . Register-Memory Processor . . . Register-Register Processor . . . . Memory 2/23/2021 UAH-CPE 631 38

CPE 631 AM Example: Code Sequence for C = A+B Stack Push Add Pop

CPE 631 AM Example: Code Sequence for C = A+B Stack Push Add Pop A B Accumulator Load Add Store A B C Register-Memory Load Add Store R 1, A R 3, R 1, B C, R 3 C 4 instr. 3 mem. op. 2/23/2021 3 instr. 3 mem. op. UAH-CPE 631 Load-store Load Add Store R 1, A R 2, B R 3, R 1, R 2 C, R 3 4 instr. 3 mem. op. 39

CPE 631 AM Development of ISA n Early computers used stack or accumulator architectures

CPE 631 AM Development of ISA n Early computers used stack or accumulator architectures – accumulator architecture easy to build – stack architecture closely matches expression evaluation algorithms (without optimisations!) n GPR architectures dominate from 1975 – registers are faster than memory – registers are easier for a compiler to use – hold variables • memory traffic is reduced, and the program speedups • code density is increased (registers are named with fewer bits than memory locations) 2/23/2021 UAH-CPE 631 40

CPE 631 AM Programming Registers Ideally, use of GPRs should be orthogonal; i. e.

CPE 631 AM Programming Registers Ideally, use of GPRs should be orthogonal; i. e. , any register can be used as any operand with any instruction n May be difficult to implement; some CPUs compromise by limiting use of some registers n How many registers? n – PDP-11: 8; some reserved (e. g. , PC, SP); only a few left, typically used for expression evaluation – VAX 11/780: 16; some reserved (e. g. , PC, SP, FP); enough left to keep some variables in registers – RISC: 32; can keep many variables in registers 2/23/2021 UAH-CPE 631 41

CPE 631 AM Operand Access n Number of operands – 3; instruction specifies result

CPE 631 AM Operand Access n Number of operands – 3; instruction specifies result and 2 source operands – 2; one of the operands is both a source and a result n How many of the operands may be memory addresses in ALU instructions? Number of memory Maximum number of addresses operands Examples 0 3 SPARC, MIPS, HP-PA, Power. PC, Alpha, ARM, Trimedia 1 2 Intel 80 x 86, Motorola 68000, TI TMS 320 C 54 2/3 2/23/2021 VAX UAH-CPE 631 42

CPE 631 AM Operand Access: Comparison Type Advantages Disadvantages Reg-Reg (0 -3) Simple, fixed

CPE 631 AM Operand Access: Comparison Type Advantages Disadvantages Reg-Reg (0 -3) Simple, fixed length instruction encoding. Simple code generation. Instructions take similar number of clocks to execute. Higher inst. count. Some instructions are short and bit encoding may be wasteful. Reg-Mem (1, 2) Data can be accessed without loading first. Instruction format tends to be easy to decode and yields good density. Source operand is destroyed in a binary operation. Clocks per instruction varies by operand location. Mem-Mem Most compact. (3, 3) 2/23/2021 Large variation in instruction size and clocks per instructions. Memory bottleneck. UAH-CPE 631 43

CPE 631 AM Type and Size of Operands n How is the type of

CPE 631 AM Type and Size of Operands n How is the type of an operand designated? – encoded in the opcode; most often used (eg. Add, Add. U) – data are annotated with tags that are interpreted by hw n Common operand types – character (1 byte) {ASCII} – half word (16 bits) {short integers, 16 -bit Java Unicode} – word (32 bits) {integers} – single-precision floating point (32 bits) – double-precision floating point (64 bits) – binary packed/unpacked decimal - used infrequently 2/23/2021 UAH-CPE 631 44

CPE 631 AM Type and Size of Operands (cont’d) n Distribution of data accesses

CPE 631 AM Type and Size of Operands (cont’d) n Distribution of data accesses by size (SPEC) – – n Double word: 0% (Int), 69% (Fp) Word: 74% (Int), 31% (Fp) Half word: 19% (Int), 0% (Fp) Byte: 7% (Int), 0% (Fp) Summary: a new 32 -bit architecture should support – 8 -, 16 - and 32 -bit integers; 64 -bit floats – 64 -bit integers may be needed for 64 -bit addressing – others can be implemented in software n Operands for media and signal processing – Pixel – 8 b (red), 8 b (green), 8 b (blue), 8 b (transparency of the pixel) – Fixed-point (DSP) – cheap floating-point – Vertex (graphic operations) – x, y, z, w 2/23/2021 UAH-CPE 631 45

CPE 631 AM Addressing Modes n Addressing mode - how a computer system specifies

CPE 631 AM Addressing Modes n Addressing mode - how a computer system specifies the address of an operand – – n constants registers memory locations I/O addresses Memory addressing – since 1980 almost every machine uses addresses to level of 1 byte => – How do byte addresses map onto 32 bits word? – Can a word be placed on any byte boundary? 2/23/2021 UAH-CPE 631 46

CPE 631 AM Interpreting Memory Addresses n Big Endian – address of most significant

CPE 631 AM Interpreting Memory Addresses n Big Endian – address of most significant byte = word address (xx 00 = Big End of the word); – IBM 360/370, MIPS, Sparc, HP-PA n Little Endian – address of least significant byte = word address (xx 00 = Little End of the word); – Intel 80 x 86, DEC VAX, DEC Alpha n Alignment – require that objects fall on address that is multiple of their size 2/23/2021 UAH-CPE 631 47

CPE 631 AM Interpreting Memory Addresses Memory 7 a a+1 a+2 a+3 0 Big

CPE 631 AM Interpreting Memory Addresses Memory 7 a a+1 a+2 a+3 0 Big Endian msb lsb 0 31 0 x 00 0 x 01 0 x 03 0 x 02 0 x 01 Little Endian 0 x 02 0 x 03 msb lsb 0 31 0 x 03 a a+4 a+8 a+C 0 x 02 0 x 01 0 x 00 Aligned Not Aligned 2/23/2021 UAH-CPE 631 48

CPE 631 AM Addressing Modes: Examples Addr. mode Example Meaning When used Register ADD

CPE 631 AM Addressing Modes: Examples Addr. mode Example Meaning When used Register ADD R 4, R 3 Regs[R 4]+Regs[R 3] a value is in register Immediate ADD R 4, #3 Regs[R 4]+3 for constants Displacem. ADD R 4, 100(R 1) Regs[R 4]+Mem[100+Regs[R 1]] local variables Reg. indirect ADD R 4, (R 1) Regs[R 4]+Mem[Regs[R 1]] accessing using a pointer Indexed ADD R 4, (R 1+R 2) Regs[R 4]+Mem[Regs[R 1]+Regs[R 2]] array addressing (base + offset) Direct ADD R 4, (1001) Regs[R 4]+Mem[1001] addr. static data Mem. indirect ADD R 4, @(R 3) Regs[R 4]+Mem[Regs[R 3]]] if R 3 keeps the address of a pointer p, this yields *p Autoincr. ADD R 4, (R 3)+ Regs[R 4]+Mem[Regs[R 3]] Regs[R 3] + d stepping through arrays within a loop; d defines size of an el. Autodecr. ADD R 4, -(R 3) Regs[R 3] – d Regs[R 4]+Mem[Regs[R 3]] similar as previous Scaled ADD R 4, 100(R 2)[R 3] Regs[R 4] + Mem[100+Regs[R 2]+Regs[R 3]*d] to index arrays 2/23/2021 UAH-CPE 631 49

CPE 631 AM Addressing Mode Usage n 3 programs measured on machine with all

CPE 631 AM Addressing Mode Usage n 3 programs measured on machine with all address modes (VAX) – register direct modes are not counted (one-half of the operand references) – PC-relative is not counted (exclusively used for branches) n Results – – – Displacement Immediate Register indirect Scaled Memory indirect Misc. 2/23/2021 42% avg, (32 - 55%) 75% 85% 33% avg, (17 - 43%) 13% avg, (3 - 24%) 7% avg, (0 - 16%) 3% avg, (1 - 6%) 2% avg, (0 - 3%) UAH-CPE 631 50

CPE 631 AM Displacement, immediate size n Displacement – 1% of addresses require >

CPE 631 AM Displacement, immediate size n Displacement – 1% of addresses require > 16 bits – 25% of addresses require > 12 bits n Immediate – If they need to be supported by all operations? • • Loads: Compares: ALU operations: All instructions: 10% (Int), 45% (Fp) 87% (Int), 77% (Fp) 58% (Int), 78% (Fp) 35% (Int), 10% (Fp) – What is the range of values? • 50% - 70% fit within 8 bits • 75% - 80% fit within 16 bits 2/23/2021 UAH-CPE 631 51

CPE 631 AM Addressing modes: Summary Data addressing modes that are important: Displacement, Immediate,

CPE 631 AM Addressing modes: Summary Data addressing modes that are important: Displacement, Immediate, Register Indirect n Displacement size should be 12 to 16 bits n Immediate should be 8 to 16 bits n 2/23/2021 UAH-CPE 631 52

CPE 631 AM Addressing Modes for Signal Processing n DSPs deal with continuous infinite

CPE 631 AM Addressing Modes for Signal Processing n DSPs deal with continuous infinite stream of data => circular buffers – Modulo or Circular addressing mode n FFT shuffles data at the start or end – 0 (000) => 0 (000), 1 (001) => 4 (100), 2 (010) => 2 (010), 3 (011) => 6 (110), . . . – Bit reverse addressing mode • take original value, do bit reverse, and use it as an address n 6 mfu modes from found in desktop, account for 95% of the DSP addr. modes 2/23/2021 UAH-CPE 631 53

CPE 631 AM Typical Operations Data Movement Arithmetic Shift Logical Control Subroutine Linkage System

CPE 631 AM Typical Operations Data Movement Arithmetic Shift Logical Control Subroutine Linkage System Synchronization Floating-point String Graphics 2/23/2021 load (from memory), store (to memory) mem-to-mem move, reg-to-reg move input (from IO device), push (to stack), pop (from stack), output (to IO device), integer (binary + decimal), Add, Subtract, Multiply, Divide shift left/right, rotate left/right not, and, or, xor, clear, set unconditional/conditional jump call/return OS call, virtual memory management test-and-set FP Add, Subtract, Multiply, Divide, Compare, SQRT String move, compare, search Pixel and vertex operations, compression/decomp. UAH-CPE 631 54

CPE 631 AM Top ten 8086 instructions Rank Instruction % total execution 1 load

CPE 631 AM Top ten 8086 instructions Rank Instruction % total execution 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move reg-reg 4% 9 call 1% 10 return 1% Total 96% Simple instructions dominate instruction frequency => support them 2/23/2021 UAH-CPE 631 55

CPE 631 AM Operations for Media and Signal Processing n Multimedia processing and limits

CPE 631 AM Operations for Media and Signal Processing n Multimedia processing and limits of human perception – use narrower data words (don’t need 64 b fp) => wide ALU’s operate on several data items at the same time – partition add – e. g. , perform four 16 -bit adds on a 64 -bit ALU – SIMD – Single instruction Multiple Data or vector instructions (see Appendix F) – Figure 2. 17 (page 110) n DSP processors – algorithms often need saturating arithmetic • if result too large to be represented, it is set to the largest representable number – often need several rounding modes – MAC (Multiply and Accumulate) instructions 2/23/2021 UAH-CPE 631 56

CPE 631 AM Instructions for Control Flow n Control flow instructions – Conditional branches

CPE 631 AM Instructions for Control Flow n Control flow instructions – Conditional branches (75% int, 82% fp) – Call/return (19% int, 8% fp) – Jump (6% int, 10% fp) n Addressing modes for control flows – PC-relative – for returns and indirect jumps the target is not known in compile time => specify a register which contains the target address 2/23/2021 UAH-CPE 631 57

CPE 631 AM Instructions for Control Flow (cont’d) n Methods for branch evaluation –

CPE 631 AM Instructions for Control Flow (cont’d) n Methods for branch evaluation – Condition Code – CC (ARM, 80 x 86, Power. PC) • tests special bits set by ALU instructions – Condition register (Alpha, MIPS) • tests arbitrary register – Compare and branch (PA-RISC, VAX) • compare is a part of branch n Procedure invocation options – do control transfer and possibly some state saving • at least return address must be saved (in link register) – compiler generate loads and stores to save the state – Caller savings vs. callee savings 2/23/2021 UAH-CPE 631 58

CPE 631 AM Encoding an Instruction Set n Instruction set architect must choose how

CPE 631 AM Encoding an Instruction Set n Instruction set architect must choose how to represent instructions in machine code – Operation is specified in one field called Opcode – Each operand is specified by a separate Address specifier (tells which addressing modes is used) n Balance among – Many registers and addressing modes adds to richness – Many registers and addressing modes increase code size – Lengths of code objects should "match" architecture; e. g. , 16 or 32 bits 2/23/2021 UAH-CPE 631 59

CPE 631 AM Basic variations in encoding a) Variable (e. g. VAX) Operation &

CPE 631 AM Basic variations in encoding a) Variable (e. g. VAX) Operation & no. of operands Address specifier 1 Address field 1 . . . Address specifier n Address field n b) Fixed (e. g. DLX, MIPS, Power. PC, . . . ) Operation Address field 1 Address field 2 Address field 3 c) Hybrid (e. g. IBM 360/70, Intel 80 x 86) Operation Address specifier 1 Address field 1 Operation Address specifier 1 Address specifier 2 Address field Operation Address specifier Address field 1 Address field 2 2/23/2021 UAH-CPE 631 60

CPE 631 AM Summary of Instruction Formats If code size is most important, use

CPE 631 AM Summary of Instruction Formats If code size is most important, use variable length instructions n If performance is over most important, use fixed length instructions n Reduced code size in RISCs n – hybrid version with both 16 -bit and 32 -bit ins. • narrow instructions support fewer operations, smaller address and immediate fields, fewer registers, and 2 -address format • ARM Thumb, MIPS 16 (Appendix C) – IBM: compressed code is kept in main memory, ROMs, disk • caches keep decompressed code 2/23/2021 UAH-CPE 631 61

CPE 631 AM Role of Compilers n Structure of recent compilers – 1) Front-end

CPE 631 AM Role of Compilers n Structure of recent compilers – 1) Front-end • transform language to common intermediate form – language dependent, machine independent – 2) High-level optimizations • e. g. , loop transformations, procedure inlining, . . . – somewhat language dependent, machine independent – 3) Global optimizer • global and local optimizations, register allocation – small language dependencies, somewhat machine dependencies – 4) Code generator • instruction selection, machine dependent optimizations – language independent, highly machine dependent 2/23/2021 UAH-CPE 631 62

CPE 631 AM Compiler Optimizations n 1) High-level optimizations – done on the source

CPE 631 AM Compiler Optimizations n 1) High-level optimizations – done on the source n 2) Local optimizations – optimize code within a basic block n 3) Global optimizations – extend local optimizations across branches (loops) n 4) Register allocation – associates registers with operands n 5) Processor-dependent optimizations – take advantage of specific architectural knowledge 2/23/2021 UAH-CPE 631 63