Embedded RealTime Systems Lecture Virtual Memory Dimitris Metafas

Embedded Real-Time Systems Lecture: Virtual Memory Dimitris Metafas, Ph. D.

Virtual Memory Introduction – How Does a Program Start Running 0 x 3 fff RAM 00111101 011010101 10101101 0 x 0000 11010101 • OS (loader) copies a program from permanent storage into RAM on PCs and workstations, the “operating system” copies the program (bits) from disk. • CPU’s Program Counter is then set to the starting address of the program and the program begins execution.

If the program is too big … • Some machines won't let you run the program – Original DOS – Why this limitation? RAM 0 x 3 fff 0011100 0010101 1001101 0011100 0010101 0 x 0000 1101011

Solution: Virtual Memory • What is virtual memory? – Technique that allows execution of a program that may not completely reside in memory (RAM) • Allows the computer to “fake”' a program into believing that its memory space is larger than physical RAM • Key concept use DRAM as a cache for the disk – Address space of a process can exceed main memory size – Sum of address space of multiple processes can exceed main memory size • Why is VM important? – Cheap no longer have to buy lots of RAM – Removes burden of memory resource management from the programmer

But Wait There is More… • Virtual Memory concept also – Simplifies memory management by providing each process with same uniform address space – Protects one process from corrupting data of another process or corrupting its own read only (text) sections 0 xc 0000000 Kernel virtual memory User stack (created at runtime) 0 x 40000000 • In embedded systems virtual memory is typically used to provide memory protection and uniform address space for processes Memory mapped region for shared libraries Run time heap (created at runtime by malloc) Read/write segment (. data, . bss) 0 x 08048000 0 Read only segment (. init, . text, . rodata) Unused Source: Bryant & O’Hallaron

How Does VM Work • On program startup – OS copies program into RAM – If there is not enough RAM, OS stops copying program and starts running it with only a portion of the program loaded in RAM – When the program touches a part of the program not in physical memory (RAM), OS catches the memory abort (called a page fault) and copies that part of the program from disk into RAM – In order to copy some of the program from disk to RAM, OS must evict parts of the program already in RAM • OS copies the evicted parts of the program back to disk

Physical versus Virtual Addressing • Two memory “spaces” • Virtual memory space what the program “sees” • Physical memory space what the program runs in (size of RAM) • Virtual memory requires • Dedicated hardware on CPU chip called Memory Management Unit (MMU) • Cooperation between CPU hardware & operating system CPU Physical address (PA) 4 Main memory 0: 1: 2: 3: 4: 5: 6: CPU chip CPU Virtual address (VA) Address translation MMU 4100 4 7: . . . 8: M 1: Data word Physical address (PA) Main memory 0: 1: 2: 3: 4: 5: 6: Data word Source: Bryant & O’Hallaron

Example: Virtual and Physical Address Spaces Virtual Address Space 0 x 00 add r 1, r 2, r 3 0 x 04 sub r 2, r 3, r 4 0 x 08 lw r 2, 0 x 04 0 x 0 C mult r 3, r 4, r 5 0 x 10 bne 0 x 00 0 x 14 add r 10, r 1, r 2 0 x 18 sub r 3, r 4, r 1 0 x 1 C sw r 5, 0 x 0 c Physical Address Space add r 1, r 2, r 3 0 x 00 sub r 2, r 3, r 4 0 x 04 lw r 2, 0 x 04 0 x 08 mult r 3, r 4, r 5 0 x 0 C

Example (con'td): Need VA to PAmappings V VA PA 1 0 x 00 1 0 x 04 1 0 x 08 1 0 x 0 c sub r 2, r 3, r 4 0 x 04 0 0 x 10 Disk lw r 2, 0 x 04 0 x 08 0 0 x 14 Disk 0 x 14 add r 10, r 1, r 2 0 0 x 18 Disk 0 x 18 sub r 3, r 4, r 1 0 0 x 1 c Disk 0 x 00 add r 1, r 2, r 3 0 x 04 sub r 2, r 3, r 4 0 x 08 lw r 2, 0 x 04 0 x 0 C mult r 3, r 4, r 5 0 x 10 0 x 1 C bne 0 x 00 add r 1, r 2, r 3 0 x 00 mult r 3, r 4, r 5 0 x 0 C sw r 5, 0 x 0 c MMU uses this mapping to do the address translation

Example (cont’d): After handling a page fault V VA PA 0 0 x 00 Disk 1 0 x 04 1 0 x 08 1 0 x 0 c sub r 2, r 3, r 4 0 x 04 1 0 x 10 0 x 00 lw r 2, 0 x 04 0 x 08 0 0 x 14 Disk 0 x 14 add r 10, r 1, r 2 0 0 x 18 Disk 0 x 18 sub r 3, r 4, r 1 0 0 x 1 c Disk 0 x 00 add r 1, r 2, r 3 0 x 04 sub r 2, r 3, r 4 0 x 08 lw r 2, 0 x 04 0 x 0 C mult r 3, r 4, r 5 0 x 10 0 x 1 C bne 0 x 00 sw r 5, 0 x 0 c bne 0 x 00 mult r 3, r 4, r 5 0 x 0 C

Example (con'td): After a second page fault V VA PA 1 0 x 00 0 x 04 Disk 1 0 x 08 1 0 x 0 c 1 0 x 10 0 x 00 0 0 x 14 Disk 0 x 14 add r 10, r 1, r 2 0 0 x 18 Disk 0 x 18 sub r 3, r 4, r 1 0 0 x 1 c Disk 0 x 00 add r 1, r 2, r 3 0 x 04 sub r 2, r 3, r 4 0 x 08 lw r 2, 0 x 04 0 x 0 C mult r 3, r 4, r 5 0 x 10 0 x 1 C bne 0 x 00 sw r 5, 0 x 0 c bne 0 x 00 add r 1, r 2, r 3 0 x 04 lw r 2, 0 x 04 0 x 08 mult r 3, r 4, r 5 0 x 0 C

Basic VM Algorithm • Program asks for virtual address • MMU translates virtual address (VA) to physical address (PA) • Computer reads PA from RAM, returning it to program VA->PA RAM add r 1, r 2, r 3 0 x 00 Processor (running program) Virtual address sub r 2, r 3, r 4 0 x 04 lw r 2, 0 x 04 0 x 08 mult r 3, r 4, r 5 0 x 0 C Instructions (or data)

Page Tables • Table which holds VA a PA translations is called the page table • In our current scheme, each word is translated from a virtual address to a physical address – Where is the page table located? – How big is the page table (for our scheme assuming 32 bit addresses)? VA->PA RAM add r 1, r 2, r 3 0 x 00 Processor (running program) Virtual address sub r 2, r 3, r 4 0 x 04 lw r 2, 0 x 04 0 x 08 mult r 3, r 4, r 5 0 x 0 C Instructions (or data)

Real Page Tables • Instead of the fine grained. VM where any virtual word can map to any RAM word location, partition memory into chunks called pages – Typical page sizes are 1, 4, 8, 64 Kbytes or 1 Mbyte – With 32 bit addresses and assuming 4 KByte page size, the 20 MSBs determine page number • Map each virtual page to a physical page – Within a page, the virtual offset == physical offset • This reduces the number of VA a. PA translation entries – Only one translation per page – For a 4 KByte page, that's one VA a PA translation for every 1, 024 words 31 Virtual Address 12 11 Virtual Page Number 0 Page Offset Translation Physical Address 24 12 11 Physical Page # Page Offset 0

Virtual Pages & Physical Page Frames Virtual Address Space Physical Address Space 0 x 7 fff 0 x 3 fff 0 x 6 fff 0 x 2 fff 0 x 5 fff 0 x 1 fff 0 x 4 fff 0 x 0000 0 x 3 fff 0 x 2 fff physical page frame 0 x 1 fff virtual page 0 x 0 fff 0 x 0000 0 x 000 -0 xfff = 4 KB Every address within a virtual page maps to the same location within a physical page frame • In other words, bottom log 2(page size in bytes) bits are not translated

Example • Page size – 4 bytes • Virtual memory – 32 bytes • Physical memory – 16 bytes 0 1 virtual 2 memory 3 • Virtual address 20 maps onto ____? • Virtual address 25 maps onto ____? • Virtual address 6 maps onto ____? 4 5 6 7 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 disk 1 2 2 3 3 disk 4 disk 5 0 6 1 7 disk page table 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 physical memory

Page Table • Who fills/sets up and manages the page table? • How does the MMU know where the page table is located in memory? – A register (typically called page table base register (PTBR)) is used to store the starting address of the table • Virtual page number can be used as an index into this table to determine the corresponding physical page number • The entries in this table are called page table entries (PTE) and store the mapping to physical page number • Each PTE is typically 32 bits (assuming 32 bit virtual addresses) – Assuming 4 KByte page size • need 20 bits of the PTE for storing the physical page number – What about the remaining 12 bits? – Can use these remaining bits to store other information e. g. access permissions (whether this physical page stores code/data etc. )

Page Table Entries • A real page table entry (PTE) contains a number of fields – – – Physical page number Valid/Invalid bit Access control bits (e. g. , writeable bit) Status bits (e. g. , accessed and dirty bits) MMU control bits (e. g. , cacheable bits) 31 12 10 Physical Page Number (PPN) V Access Control Sample page table entry 0 Status MMU

Example: Single Level Page Table 31 Virtual Address 12 11 value = x 0 value = y 32 bits x 220 entries y page table data page frame 8 bits Size of page table = 220 * 32 bits = 4 Mbytes Size of page = 212 * 8 bits = 4 Kbytes 212 entries

Single Level Page Table • Assumptions – 32 bit virtual addresses – 4 Kbyte page size = 212 bytes – 32 bit address space • How many virtual page numbers? – 232 / 212 = 220 = 1, 048, 576 virtual page numbers = number of entries in the page table • If each page table entry occupies 4 bytes, how much memory is needed to store the page table? – 220 entries * 4 bytes = 222 bytes = 4 Mbytes • How do we reduce the amount of main memory to store the page table?

Two level. Page Tables 31 22 21 12 11 0 Virtual Address page directory page table data page frame

Two Level Page Table Scheme Source: Silberschatz, Galvin & Gagne

Example: Two level. Page Table 31 Virtual Address 22 21 value = x 12 11 0 value = z value = y x 210 entries y page directory 32 bits Size of page directory = 210 * 32 bits = 4 Kbytes page table 32 bits 210 entries z data page frame 8 bits Size of page table = 210 * 32 bits = 4 Kbytes Size of page = 212 * 8 bits = 4 Kbytes 212 entries

Two Level Page Table • Assumptions – – 210 entries in page directory (= max number of page tables) 210 entries in page table 32 bits allocated for each page directory entry 32 bits allocated for each page table entry • How much memory is needed? – Page table size = 210 entries * 32 bits = 212 bytes = 4 Kbytes – Page directory size = 210 entries * 32 bits = 212 bytes = 4 Kbytes

So Where Are We? • Virtual Memory simplifies memory management by providing each process with same uniform address space Kernel Stack Heap Data Code Task 1 Task 2 Physical Memory • Protects one process from corrupting data of another process or corrupting its own read only (text) sections – The OS can ensure that the VA generated by a process map only to the PA of that process only (unless it explicitly shares code or data) – Page Table Entries have access bits that can be used to disallow writes to text or read only data sections or kernel code/data or shared library

Address translation • • The page table is typically stored in memory Steps involved in accessing a page in memory (assuming single level page table) – – – MMU obtains the address from the CPU & partitions the address into virtual page number VPN and the virtual page offset VPO MMU generates the address of the page table entry (PTE) and requests the contents from memory Memory returns PTE to MMU constructs the physical address and sends it to memory Memory returns the requested data word to CPU Source: Silberschatz, Galvin & Gagne

Speeding Up Address Translation with TLB • Where is the page table stored? • Requires two memory accesses to access any memory location • Use “Translation Lookaside Buffer” (TLB) to speed up address translation – – TLB is a small, fast lookup hardware cache in the MMU TLB contains a few page table entries When virtual address is generated, it is first presented to TLB If page number is found in TLB (known as TLB hit), its frame number is immediately available and is used to access main memory – If page number is not found in TLB (known as TLB miss), a memory reference to the page table is made

Translation Lookaside Buffer CPU chip TLB 2 VPN TLB PTE 3 2 VPN 4 PTE 3 Processor 1 VA Trans lation 5 Data (a) TLB Hit Source: Bryant & O’Hallaron 4 PA Cache/ memory Processor 1 VA PTEA Trans lation Data 6 (b) TLB Miss PA 5 Cache/ memory

Interaction With Caches • Two possibilities for instruction and data cache (not the TLB) locations • Logical/Virtual Cache: operates on virtual addresses – Advantages/disadvantages? – Example: Instruction/Data caches on the XScale processor • Physical Cache: operates on physical addresses – Advantages/disadvantages? – Example: Instruction/Data caches on the Intel Pentium processor MMU VA Processor Cache Logical cache VA PA Memory Processor MMU PA Cache Physical cache Memory

Context Switching With Virtual Memory • Each task in the system has its own page table § PTBR needs to be part of the context Task 1 Task 3 Task 2 Task 1 Virtual Memory Page Tables Task 2 Task 1 running Task 3 Task 2 Task 1 Physical Memory Task 3 Virtual Memory Page Tables Task 1 Physical Memory Task 2 running

More On Context Switching • TLB entries need to be invalidated – Since the new process has the same virtual address space (but different physical address space), TLB entries are no longer valid • Need to clean virtual cache (if any) – Cleaning virtual cache may involve more than invalidating the cache item – Cache memory on write could follow one of the two policies • Writethrough caches: on a cache hit for a write operation, both the cache as well as the main memory is updated • Writeback caches: on a cache hit for a write operation, only the cache is updated – Memory is updated only when the cache item is evicted – For a writeback logical cache, cleaning the cache would involve writing back all the modified cache locations

ARM’s Memory Protection Unit • ARM processors contain a coprocessor called system control coprocessor – Controls operations of on chip cache, memory management unit • Some ARM processors only have a memory protection unit (MPU) – Allows physical memory to be divided into 8 or fewer sections – Can specify the starting address and size of each section • Maximum size: 4 GBytes • Minimum size: 4 KBytes – Access permission can be specified for each section • Sections can have privileged mode only access, privileged mode full access or user mode read only access, full access – Provides no address translation • Many ARM processors (including the XScale processor) have an MMU

ARM MMU • Presents 4 GB virtual address space • Memory granularity: 4 options supported – – 1 MB sections Large pages (64 KBytes) access control within a large page on 16 KBytes Small pages (4 KBytes) access control within a large page on 1 Kbytes Tiny pages (1 KBytes) 31 16 15 Base Physical Address 1211 10 0000 AP 3 AP 2 Large page table entry 3 2 1 0 AP 1 AP 0 C B 0 1

ARM MMU (contd. ) • Puts processor in Abort Mode when virtual address not mapped or permission check fails • A page table base register (called the translation table base register, in ARM jargon) stores the base address of the page table – Useful for context switching of processes
- Slides: 34