Introduction to Computer Organization and Architecture Lecture 9

  • Slides: 52
Download presentation
Introduction to Computer Organization and Architecture Lecture 9 By Juthawut Chantharamalee http: //dusithost. dusit.

Introduction to Computer Organization and Architecture Lecture 9 By Juthawut Chantharamalee http: //dusithost. dusit. ac. th/~juthawut_cha/ home. htm

Outline p Virtual Memory n n n n Basics Address Translation Cache vs VM

Outline p Virtual Memory n n n n Basics Address Translation Cache vs VM Paging Replacement TLBs Segmentation Page Tables Introduction to Computer Organization and Architecture 2

The Full Memory Hierarchy Capacity Access Time Cost CPU Registers 100 s Bytes <10

The Full Memory Hierarchy Capacity Access Time Cost CPU Registers 100 s Bytes <10 s ns Cache K Bytes 10 -100 ns 1 -0. 1 cents/bit Main Memory M Bytes 200 ns- 500 ns $. 0001 -. 00001 cents /bit Disk G Bytes, 10 ms (10, 000 ns) -5 -6 10 - 10 cents/bit Tape infinite sec-min 10 Staging Xfer Unit Upper Level faster Registers Instr. Operands prog. /compiler 1 -8 bytes Cache Blocks cache cntl 8 -128 bytes Memory Pages OS 4 K-16 K bytes Files user/operator Mbytes Disk Tape Larger Lower Level -8 Introduction to Computer Organization and Architecture 3

Virtual Memory p Some facts of computer life… n n n p Computers run

Virtual Memory p Some facts of computer life… n n n p Computers run lots of processes simultaneously No full address space of memory for each process Must share smaller amounts of physical memory among many processes Virtual memory is the answer! n Divides physical memory into blocks, assigns them to different processes Introduction to Computer Organization and Architecture 4

Virtual Memory Virtual memory (VM) allows main memory (DRAM) to act like a cache

Virtual Memory Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). p VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk. p Compiler assigns data to a “virtual” address. VA translated to a real/physical somewhere in memory… (allows any program to run anywhere; where is determined by a particular machine, OS) Introduction to Computer Organization and Architecture 5

VM Benefit p VM provides the following benefits n n n Allows multiple programs

VM Benefit p VM provides the following benefits n n n Allows multiple programs to share the same physical memory Allows programmers to write code as though they have a very large amount of main memory Automatically handles bringing in data from disk Introduction to Computer Organization and Architecture 6

Virtual Memory Basics p Programs reference “virtual” addresses in a non-existent memory n n

Virtual Memory Basics p Programs reference “virtual” addresses in a non-existent memory n n p Divide physical memory into blocks, called pages n p Anywhere from 512 to 16 MB (4 k typical) Virtual-to-physical translation by indexed table lookup n p These are then translated into real “physical” addresses Virtual address space may be bigger than physical address space Add another cache for recent translations (the TLB) Invisible to the programmer n Looks to your application like you have a lot of memory! Introduction to Computer Organization and Architecture 7

VM: Page Mapping Process 1’s Virtual Address Space Page Frames Process 2’s Virtual Address

VM: Page Mapping Process 1’s Virtual Address Space Page Frames Process 2’s Virtual Address Space Disk Physical Memory Introduction to Computer Organization and Architecture 8

VM: Address Translation 20 bits 12 bits Virtual page number Page offset Log 2

VM: Address Translation 20 bits 12 bits Virtual page number Page offset Log 2 of pagesize Per-process page table Valid bit Protection bits Dirty bt Reference bit Page Table base Physical page number Page offset To physical memory Introduction to Computer Organization and Architecture 9

Example of virtual memory p p Relieves problem of making a program that was

Example of virtual memory p p Relieves problem of making a program that was too large to fit in physical memory – well…. fit! Allows program to run in any location in physical memory n (called relocation) n Really useful as you might want to run same program on lots machines… Virtual Address 0 4 8 12 Physical Address A B C D Virtual Memory 0 4 K 8 K 12 K 16 K 20 K 24 K 28 K Physical Main Memory C A B D Disk Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D; The physical location of the 3 pages – 3 are in main memory and 1 is located on the disk Introduction to Computer Organization and Architecture 10

Cache terms vs. VM terms So, some definitions/“analogies” n n A “page” or “segment”

Cache terms vs. VM terms So, some definitions/“analogies” n n A “page” or “segment” of memory is analogous to a “block” in a cache A “page fault” or “address fault” is analogous to a cache miss so, if we go to main memory and our data isn’t there, we need to get it from disk… Introduction to Computer Organization and Architecture “real”/physical memory 11

More definitions and cache comparisons p These are more definitions than analogies… n With

More definitions and cache comparisons p These are more definitions than analogies… n With VM, CPU produces “virtual addresses” that are translated by a combination of HW/SW to “physical addresses” n The “physical addresses” access main memory p The process described above is called “memory mapping” or “address translation” Introduction to Computer Organization and Architecture 12

Cache VS. VM comparisons (1/2) Parameter First-level cache Virtual memory Block (page) size 12

Cache VS. VM comparisons (1/2) Parameter First-level cache Virtual memory Block (page) size 12 -128 bytes 4096 -65, 536 bytes Hit time 1 -2 clock cycles 40 -100 clock cycles Miss penalty (Access time) (Transfer time) 8 -100 clock cycles (6 -60 clock cycles) (2 -40 clock cycles) 700, 000 – 6, 000 clock cycles (500, 000 – 4, 000 clock cycles) (200, 000 – 2, 000 clock cycles) Miss rate 0. 5 – 10% 0. 00001 – 0. 001% Data memory size 0. 016 – 1 MB 4 MB – 4 GB Introduction to Computer Organization and Architecture 13

Cache VS. VM comparisons (2/2) p Replacement policy: n n Replacement on cache misses

Cache VS. VM comparisons (2/2) p Replacement policy: n n Replacement on cache misses primarily controlled by hardware Replacement with VM (i. e. which page do I replace? ) usually controlled by OS p p Because of bigger miss penalty, want to make the right choice Sizes: n n Size of processor address determines size of VM Cache size independent of processor address size Introduction to Computer Organization and Architecture 14

Virtual Memory p Timing’s tough with virtual memory: n AMAT n = Tmem +

Virtual Memory p Timing’s tough with virtual memory: n AMAT n = Tmem + (1 -h) * Tdisk = 100 n. S + (1 -h) * 25, 000 n. S ph (hit rate) had to be incredibly (almost unattainably) close to perfect to work p so: VM is a “cache” but an odd one. Introduction to Computer Organization and Architecture 15

Paging Hardware Physical Memory How big is a page? How big is the page

Paging Hardware Physical Memory How big is a page? How big is the page table? 32 CPU 32 page offset page frame offset page table frame Introduction to Computer Organization and Architecture 16

Address Translation in a Paging System Virtual Address Page # Offset Frame # Offset

Address Translation in a Paging System Virtual Address Page # Offset Frame # Offset Register Page Table Ptr Page Table Offset + P# Page Frame # Program Paging Introduction to Computer Organization and Architecture Main Memory 17

How big is a page table? p Suppose n 32 bit architecture n Page

How big is a page table? p Suppose n 32 bit architecture n Page size 4 kilobytes n Therefore: 0000 0000 0000 0000 Page Number 220 Offset 212 Introduction to Computer Organization and Architecture 18

Test Yourself A processor asks for the contents of virtual memory address 0 x

Test Yourself A processor asks for the contents of virtual memory address 0 x 10020. The paging scheme in use breaks this into a VPN of 0 x 10 and an offset of 0 x 020. PTR (a CPU register that holds the address of the page table) has a value of 0 x 100 indicating that this process’s page table starts at location 0 x 100. The machine uses word addressing and the page table entries are each one word long. PTR 0 x 100 VPN Memory Reference OFFSET 0 x 010 Introduction to Computer Organization and Architecture 0 x 020 19

Test Yourself ADDR 0 x 00000 0 x 00110 0 x 00120 0 x

Test Yourself ADDR 0 x 00000 0 x 00110 0 x 00120 0 x 00130 0 x 00145 0 x 10000 0 x 10020 0 x 22000 0 x 22020 0 x 45000 0 x 45020 CONTENTS 0 x 00000 0 x 00010 0 x 00022 0 x 00045 0 x 00078 0 x 00010 0 x 03333 0 x 04444 0 x 01111 0 x 02222 0 x 05555 0 x 06666 PTR 0 x 100 VPN Memory Reference 0 x 010 • 1. 2. 3. 4. 5. OFFSET 0 x 020 What is the physical address calculated? 10020 22020 45000 45020 none of the above Introduction to Computer Organization and Architecture 20

Test Yourself ADDR 0 x 00000 0 x 00110 0 x 00120 0 x

Test Yourself ADDR 0 x 00000 0 x 00110 0 x 00120 0 x 00130 0 x 00145 0 x 10000 0 x 10020 0 x 22000 0 x 22020 0 x 45000 0 x 45020 CONTENTS 0 x 00000 0 x 00010 0 x 00022 0 x 00045 0 x 00078 0 x 00010 0 x 03333 0 x 04444 0 x 01111 0 x 02222 0 x 05555 0 x 06666 PTR 0 x 100 VPN Memory Reference 0 x 010 OFFSET 0 x 020 • What is the physical address calculated? • What is the contents of this address returned to the processor? • How many memory accesses in total were required to obtain the contents of the desired address? Introduction to Computer Organization and Architecture 21

Another Example Logical memory 0 a 1 b 2 c 3 d 4 e

Another Example Logical memory 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p Page Table 00 01 10 11 01 01 0 1 2 3 5 6 1 2 101 110 001 010 110 01 Introduction to Computer Organization and Architecture Physical memory 0 1 2 3 4 i 5 j 6 k 7 l 8 m 9 n 10 o 11 p 12 13 14 15 16 17 18 19 20 a 21 b 22 c 23 d 24 e 25 f 26 g 27 h 28 29 30 31 22

Replacement policies Introduction to Computer Organization and Architecture 23

Replacement policies Introduction to Computer Organization and Architecture 23

Block replacement p Which block should be replaced on a virtual memory miss? n

Block replacement p Which block should be replaced on a virtual memory miss? n n Again, we’ll stick with the strategy that it’s a good thing to eliminate page faults Therefore, we want to replace the LRU block Many machines use a “use” or “reference” bit p Periodically reset p Gives the OS an estimation of which pages are referenced p Introduction to Computer Organization and Architecture 24

Writing a block p What happens on a write? n We don’t even want

Writing a block p What happens on a write? n We don’t even want to think about a write through policy! p n Time with accesses, VM, hard disk, etc. is so great this is not practical Instead, a write back policy is used with a dirty bit to tell if a block has been written Introduction to Computer Organization and Architecture 25

Mechanism vs. Policy p Mechanism: n paging hardware n trap on page fault p

Mechanism vs. Policy p Mechanism: n paging hardware n trap on page fault p Policy: n fetch policy: when should we bring in the pages of a process? 1. load all pages at the start of the process p 2. load only on demand: “demand paging” p n replacement policy: which page should we evict given a shortage of frames? Introduction to Computer Organization and Architecture 26

Replacement Policy p Given a full physical memory, which page should we evict? ?

Replacement Policy p Given a full physical memory, which page should we evict? ? p What policy? n Random n FIFO: First-in-first-out n LRU: Least-Recently-Used n MRU: Most-Recently-Used n OPT: (will-not-be-used-farthest-in-future) Introduction to Computer Organization and Architecture 27

Replacement Policy Simulation p example n 0 sequence of page numbers 1 2 3

Replacement Policy Simulation p example n 0 sequence of page numbers 1 2 3 42 2 37 1 2 3 p FIFO? p LRU? p OPT? p How do you keep track of LRU info? (another data structure question) Introduction to Computer Organization and Architecture 28

Page tables and lookups… p 1. it’s slow! We’ve turned every access to memory

Page tables and lookups… p 1. it’s slow! We’ve turned every access to memory into two accesses to memory n solution: add a specialized “cache” called a “translation lookaside buffer (TLB)” inside the processor p 2. it’s still huge! n even worse: we’re ultimately going to have a page table for every process. Suppose 1024 processes, that’s 4 GB of page tables! Introduction to Computer Organization and Architecture 29

Paging/VM (1/3) Operating System CPU Physical Memory 356 42 356 Disk page table i

Paging/VM (1/3) Operating System CPU Physical Memory 356 42 356 Disk page table i Introduction to Computer Organization and Architecture 30

Paging/VM (2/3) Operating System CPU Physical Memory 42 356 Disk page table i Place

Paging/VM (2/3) Operating System CPU Physical Memory 42 356 Disk page table i Place page table in physical memory However: this doubles the time per memory access!! Introduction to Computer Organization and Architecture 31

Paging/VM (3/3) Operating System CPU Physical Memory 42 356 Disk page table Cache! i

Paging/VM (3/3) Operating System CPU Physical Memory 42 356 Disk page table Cache! i Special-purpose cache for translations Historically called the TLB: Translation Lookaside Buffer Introduction to Computer Organization and Architecture 32

Translation Cache Just like any other cache, the TLB can be organized as fully

Translation Cache Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations. Note: 128 -256 entries times 4 KB-16 KB/entry is only 512 KB-4 MB the L 2 cache is often bigger than the “span” of the TLB. VA CPU Translation with a TLB Lookup miss hit PA miss Cache Main Memory hit Translation data Introduction to Computer Organization and Architecture 33

Translation Cache A way to speed up translation is to use a special cache

Translation Cache A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Page # Physical Frame # Dirty Ref Valid Access tag Really just a cache (a special-purpose cache) on the page table mappings TLB access time comparable to cache access time (much less than main memory access time) Introduction to Computer Organization and Architecture 34

An example of a TLB Page frame address <30> 1 Page Offset <13> 2

An example of a TLB Page frame address <30> 1 Page Offset <13> 2 Read/write policies and permissions… V R W Tag <1> <2> <30> Phys. Addr. <21> (Low-order 13 bits of addr. ) . . . <13> … 4 … 32: 1 Mux 3 <21> (High-order 21 bits of addr. ) Introduction to Computer Organization and Architecture 34 -bit physical address 35

The “big picture” and TLBs n Address translation is usually on the critical path…

The “big picture” and TLBs n Address translation is usually on the critical path… p n n Even in the simplest cache, TLB values must be read and compared TLB is usually smaller and faster than the cacheaddress-tag memory p n …which determines the clock cycle time of the m. P This way multiple TLB reads don’t increase the cache hit time TLB accesses are usually pipelined b/c its so important! Introduction to Computer Organization and Architecture 36

The “big picture” and TLBs Virtual Address TLB access No No Try to read

The “big picture” and TLBs Virtual Address TLB access No No Try to read from page table Replace page from disk No Write? Try to read from cache Page fault? Yes TLB Hit? No Cache hit? Yes Set in TLB Yes Cache/buffer memory write TLB miss stall Cache miss stall Deliver data to CPU Introduction to Computer Organization and Architecture 37

Pages are Cached in a Virtual Memory System p p Can Ask the Same

Pages are Cached in a Virtual Memory System p p Can Ask the Same Four Questions we did about caches Q 1: Block Placement n p choice: lower miss rates and complex placement or vice versa p miss penalty is huge p so choose low miss rate ==> place page anywhere in physical memory p similar to fully associative cache model Q 2: Block Addressing - use additional data structure n fixed size pages - use a page table p virtual page number ==> physical page number and concatenate offset p tag bit to indicate presence in main memory Introduction to Computer Organization and Architecture 38

Normal Page Tables p p Size is number of virtual pages Purpose is to

Normal Page Tables p p Size is number of virtual pages Purpose is to hold the translation of VPN to PPN n n p Permits ease of page relocation Make sure to keep tags to indicate page is mapped Potential problem: n n Consider 32 bit virtual address and 4 k pages 4 GB/4 KB = 1 MW required just for the page table! Might have to page in the page table… p Consider how the problem gets worse on 64 bit machines with even larger virtual address spaces! Might have multi-level page tables Introduction to Computer Organization and Architecture 39

Inverted Page Tables n n n Similar to a set-associative mechanism Make the page

Inverted Page Tables n n n Similar to a set-associative mechanism Make the page table reflect the # of physical pages (not virtual) Use a hash mechanism p p p virtual page number ==> HPN index into inverted page table Compare virtual page number with the tag to make sure it is the one you want if yes § check to see that it is in memory - OK if yes - if not page fault p If not - miss § go to full page table on disk to get new entry § implies 2 disk accesses in the worst case § trades increased worst case penalty for decrease in capacity induced miss rate since there is now more room for real pages with smaller page table Introduction to Computer Organization and Architecture 40

Inverted Page Table Page Offset • Only store entries for pages in physical memory

Inverted Page Table Page Offset • Only store entries for pages in physical memory Hash Page Frame V = OK Frame Offset Introduction to Computer Organization and Architecture 41

Address Translation Reality The translation process using page tables takes too long! p Use

Address Translation Reality The translation process using page tables takes too long! p Use a cache to hold recent translations p n Translation Lookaside Buffer Typically 8 -1024 entries p Block size same as a page table entry (1 or 2 words) p Only holds translations for pages in memory p 1 cycle hit time p Highly or fully associative p Miss rate < 1% p Miss goes to main memory (where the whole page table lives) p Must be purged on a process switch p Introduction to Computer Organization and Architecture 42

Back to the 4 Questions p Q 3: Block Replacement (pages in physical memory)

Back to the 4 Questions p Q 3: Block Replacement (pages in physical memory) n LRU is best p n So use it to minimize the horrible miss penalty However, real LRU is expensive Page table contains a use tag p On access the use tag is set p OS checks them every so often, records what it sees, and resets them all p On a miss, the OS decides who has been used the least p n Basic strategy: Miss penalty is so huge, you can spend a few OS cycles to help reduce the miss rate Introduction to Computer Organization and Architecture 43

Last Question p Q 4: Write Policy n Always write-back Due to the access

Last Question p Q 4: Write Policy n Always write-back Due to the access time of the disk p So, you need to keep tags to show when pages are dirty and need to be written back to disk when they’re swapped out. p n n Anything else is pretty silly Remember – the disk is SLOW! Introduction to Computer Organization and Architecture 44

Page Sizes n n An architectural choice Large pages are good: p p p

Page Sizes n n An architectural choice Large pages are good: p p p n reduces page table size amortizes the long disk access if spatial locality is good then hit rate will improve Large pages are bad: p more internal fragmentation § § p if everything is random each structure’s last page is only half full Half of bigger is still bigger if there are 3 structures per process: text, heap, and control stack then 1. 5 pages are wasted for each process start up time takes longer § since at least 1 page of each type is required to prior to start § transfer time penalty aspect is higher Introduction to Computer Organization and Architecture 45

More on TLBs p The TLB must be on chip n n n otherwise

More on TLBs p The TLB must be on chip n n n otherwise it is worthless small TLB’s are worthless anyway large TLB’s are expensive p p high associativity is likely ==> Price of CPU’s is going up! n OK as long as performance goes up faster Introduction to Computer Organization and Architecture 46

Selecting a Page Size n Reasons for larger page size p p n Page

Selecting a Page Size n Reasons for larger page size p p n Page table size is inversely proportional to the page size; therefore memory saved Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows Transferring larger pages to or from secondary storage, possibly over a network, is more efficient Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses Reasons for a smaller page size p p Want to avoid internal fragmentation: don’t waste storage; data must be contiguous within page Quicker process start for small processes - don’t need to bring in more memory than needed Introduction to Computer Organization and Architecture 47

Memory Protection n With multiprogramming, a computer is shared by several programs or processes

Memory Protection n With multiprogramming, a computer is shared by several programs or processes running concurrently p p n Need to provide protection Need to allow sharing Mechanisms for providing protection p p p Provide Base and Bound registers: Base ฃ Address ฃ Bound Provide both user and supervisor (operating system) modes Provide CPU state that the user can read, but cannot write § Branch and bounds registers, user/supervisor bit, exception bits p Provide method to go from user to supervisor mode and vice versa § system call : user to supervisor § system return : supervisor to user p Provide permissions for each flag or segment in memory Introduction to Computer Organization and Architecture 48

Pitfall: Address space to small n One of the biggest mistakes than can be

Pitfall: Address space to small n One of the biggest mistakes than can be made when designing an architecture is to devote to few bits to the address p p n address size limits the size of virtual memory difficult to change since many components depend on it (e. g. , PC, registers, effective-address calculations) As program size increases, larger and larger address sizes are needed p p p 8 bit: Intel 8080 16 bit: Intel 8086 24 bit: Intel 80286 32 bit: Intel 80386 64 bit: Intel Merced (1975) (1978) (1982) (1985) (1998) Introduction to Computer Organization and Architecture 49

Virtual Memory Summary n n Virtual memory (VM) allows main memory (DRAM) to act

Virtual Memory Summary n n Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). The large miss penalty of virtual memory leads to different stategies from cache p n Fully associative, TB + PT, LRU, Write-back Designed as paged: fixed size blocks p segmented: variable size blocks p hybrid: segmented paging or multiple page sizes p n Avoid small address size Introduction to Computer Organization and Architecture 50

Summary 2: Typical Choices Option TLB L 1 Cache L 2 Cache VM (page)

Summary 2: Typical Choices Option TLB L 1 Cache L 2 Cache VM (page) Block Size 4 -8 bytes (1 PTE) 4 -32 bytes 32 -256 bytes 4 k-16 k bytes Hit Time 1 cycle 1 -2 cycles 6 -15 cycles 10 -100 cycles Miss Penalty 10 -30 cycles 8 -66 cycles 30 -200 cycles 700 k-6 M cycles Local Miss Rate . 1 - 2% . 5 – 20% 13 - 15% . 00001 - 001% Size 32 B – 8 KB 1 – 128 KB 256 KB - 16 MB Backing Store L 1 Cache L 2 Cache DRAM Disks Q 1: Block Placement Fully or set associative DM DM or SA Fully associative Q 2: Block ID Tag/block Table Q 3: Block Replacement Random (not last) N. A. For DM Random (if SA) LRU/LFU Q 4: Writes Flush on PTE write Through or back Write-back Introduction to Computer Organization and Architecture 51

The End Lecture 9

The End Lecture 9