COMP 3500 Introduction to Operating Systems Paging Translation

  • Slides: 41
Download presentation
COMP 3500 Introduction to Operating Systems Paging: Translation Look-aside Buffers (TLB) Dr. Xiao Qin

COMP 3500 Introduction to Operating Systems Paging: Translation Look-aside Buffers (TLB) Dr. Xiao Qin Auburn University http: //www. eng. auburn. edu/~xqin@auburn. edu Slides are adopted and modified from materials developed by Drs. Silberschatz, Galvin, and Gagne

Review: Logical-to-Physical Address Translations 2

Review: Logical-to-Physical Address Translations 2

Review: Two registers to support paging • Where should we keep page tables? •

Review: Two registers to support paging • Where should we keep page tables? • Where does the Page-table base register (PTBR) point at? • The Page-table length register (PTLR) indicates size of the page table.

Page-table Base Register (Page Table Pointer) 4

Page-table Base Register (Page Table Pointer) 4

Ex 1: Two-Level Page Table 4 KB root page table 1. How many root

Ex 1: Two-Level Page Table 4 KB root page table 1. How many root page table entries? User Page Table 2. How many user page table entries? 3. How large is the user 5 address space? Assume (1) we have 4 -byte page table entry. (2) page size is 4 KB

Ex 2: Address Translation in the Two-Level Paging System 6

Ex 2: Address Translation in the Two-Level Paging System 6

Ex 3: Design virtual address format for a two-level paging system • Suppose you

Ex 3: Design virtual address format for a two-level paging system • Suppose you design a two-level page translation scheme where page size is 16 MB and page table entry size is 16 bytes. • What is the format of a 64 -bit virtual address?

Ex 4: Memory Accesses in the Paging Scheme • To load an instruction or

Ex 4: Memory Accesses in the Paging Scheme • To load an instruction or data from main memory, how many memory accesses are required in the paging scheme? – two memory accesses – One for the page table and one for the data / instruction • What is the problem with respect to memory access? – The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs)

Paging Hardware with Translation Look-aside Buffers (TLB) Q 2: Why TLBs are typically small

Paging Hardware with Translation Look-aside Buffers (TLB) Q 2: Why TLBs are typically small (64 to 1, 024 entries)? 9

Parallel Searching the TLB • How to search TLB? – Associative memory: parallel search

Parallel Searching the TLB • How to search TLB? – Associative memory: parallel search • Address translation (p, d) – If p is in associative register, get frame # out – Otherwise get frame # from page table in memory

Address Space ID in TLBs • Q 1: Why some TLBs store address-space identifiers

Address Space ID in TLBs • Q 1: Why some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process? To provide address-space protection for that process

Q 2: What happens on a TLB miss? Replacement policies must Value (? )

Q 2: What happens on a TLB miss? Replacement policies must Value (? ) is loaded into the TLB for faster access next time be considered 12

Effective Memory Access Time • Associative Lookup = Ttlb time unit • TLB hit

Effective Memory Access Time • Associative Lookup = Ttlb time unit • TLB hit ratio = h – percentage of times that a page number is found in the associative registers; T_hit = T_tlb + T_mem T_miss = T_tlb + T_mem + T_tlb_update + T_mem access page table access data T_miss = T_tlb + T_mem Effective Access Time = h*T_hit + (1 -h)*T_miss

Ex 5: Effective Access Time • Consider a single-level paging scheme. The TLB has

Ex 5: Effective Access Time • Consider a single-level paging scheme. The TLB has 32 entries. The TLB access time is 10 ns; memory access time is 200 ns. 1. How long does it take to access data in memory if there is a TLB hit? 2. How long does it take to access data in memory if there is a TLB miss? 3. What is the effective memory-access time if we have a TLB hit ratio of 80%? 4. What is the minimal hit ratio that guarantees the effective access time of at most 220 ns?

Summary • Page-table Base Register • Two-Level Page Table • Address Translation in a

Summary • Page-table Base Register • Two-Level Page Table • Address Translation in a Two-Level Paying System • Translation Look-aside Buffers (TLBs) • Effective Memory Access Time

Memory Protection • Memory protection implemented by associating protection bit with each frame to

Memory Protection • Memory protection implemented by associating protection bit with each frame to indicate if read-only or readwrite access is allowed – Can also add more bits to indicate page execute-only, and so on • Valid-invalid bit attached to each entry in the page table: – “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page – “invalid” indicates that the page is not in

Valid (v) or Invalid (i) Bit In A Page Table 17

Valid (v) or Invalid (i) Bit In A Page Table 17

Shared Pages • Shared code – One copy of read-only (reentrant) code shared among

Shared Pages • Shared code – One copy of read-only (reentrant) code shared among processes (i. e. , text editors, compilers, window systems) – Similar to multiple threads sharing the same process space – Also useful for interprocess communication if sharing of read-write pages is allowed • Private code and data – Each process keeps a separate copy of the code and data – The pages for the private code and data

Shared Pages Example 19

Shared Pages Example 19

Structure of the Page Table • Memory structures for paging can get huge using

Structure of the Page Table • Memory structures for paging can get huge using straight-forward methods – Consider a 32 -bit logical address space as on modern computers – Page size of 4 KB (212) – Page table would have 1 million entries (232 / 212) – If each entry is 4 bytes -> 4 MB of physical address space / memory for page table alone • That amount of memory used to cost a lot • Don’t want to allocate that contiguously in main memory

Hierarchical Page Tables • Break up the logical address space into multiple page tables

Hierarchical Page Tables • Break up the logical address space into multiple page tables • A simple technique is a two-level page table • We then page the page table

Two-Level Page-Table Scheme 22

Two-Level Page-Table Scheme 22

Two-Level Paging Example • A logical address (on 32 -bit machine with 1 K

Two-Level Paging Example • A logical address (on 32 -bit machine with 1 K page size) is divided into: – a page number consisting of 22 bits – a page offset consisting of 10 bits • Since the page table is paged, the page number is further divided into: – a 12 -bit page number – a 10 -bit page offset • Thus, a logical address is as follows: • where p 1 is an index into the outer page

Address-Translation Scheme

Address-Translation Scheme

64 -bit Logical Address Space • Even two-level paging scheme not sufficient • If

64 -bit Logical Address Space • Even two-level paging scheme not sufficient • If page size is 4 KB (212) – Then page table has 252 entries – If two level scheme, inner page tables could be 210 4 -byte entries – Address would look like – Outer page table has 242 entries or 244 bytes – One solution is to add a 2 nd outer page table – But in the following example the 2 nd outer page table is still 234 bytes in size

Three-level Paging Scheme

Three-level Paging Scheme

Hashed Page Tables • Common in address spaces > 32 bits • The virtual

Hashed Page Tables • Common in address spaces > 32 bits • The virtual page number is hashed into a page table – This page table contains a chain of elements hashing to the same location • Each element contains (1) the virtual page number (2) the value of the mapped page frame (3) a pointer to the next element • Virtual page numbers are compared in this chain searching for a match – If a match is found, the corresponding physical frame is extracted

Hashed Page Table 28

Hashed Page Table 28

Inverted Page Table • Rather than each process having a page table and keeping

Inverted Page Table • Rather than each process having a page table and keeping track of all possible logical pages, track all physical pages • One entry for each real page of memory • Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page • Decreases memory needed to store each page table, but increases time

Inverted Page Table Architecture 30

Inverted Page Table Architecture 30

Oracle SPARC Solaris • Consider modern, 64 -bit operating system example with tightly integrated

Oracle SPARC Solaris • Consider modern, 64 -bit operating system example with tightly integrated HW – Goals are efficiency, low overhead • Based on hashing, but more complex • Two hash tables – One kernel and one for all user processes – Each maps memory addresses from virtual to physical memory – Each entry represents a contiguous area of mapped virtual memory, • More efficient than having a separate hash-table entry for each page – Each entry has base address and span

Oracle SPARC Solaris (Cont. ) • TLB holds translation table entries (TTEs) for fast

Oracle SPARC Solaris (Cont. ) • TLB holds translation table entries (TTEs) for fast hardware lookups – A cache of TTEs reside in a translation storage buffer (TSB) • Includes an entry per recently accessed page • Virtual address reference causes TLB search – If miss, hardware walks the in-memory TSB looking for the TTE corresponding to the address • If match found, the CPU copies the TSB entry into the TLB and translation completes • If no match found, kernel interrupted to search

Example: The Intel 32 and 64 -bit Architectures • Dominant industry chips • Pentium

Example: The Intel 32 and 64 -bit Architectures • Dominant industry chips • Pentium CPUs are 32 -bit and called IA-32 architecture • Current Intel CPUs are 64 -bit and called IA 64 architecture • Many variations in the chips, cover the main ideas here

Example: The Intel IA-32 Architecture • Supports both segmentation and segmentation with paging –

Example: The Intel IA-32 Architecture • Supports both segmentation and segmentation with paging – Each segment can be 4 GB – Up to 16 K segments per process – Divided into two partitions • First partition of up to 8 K segments are private to process (kept in local descriptor table (LDT)) • Second partition of up to 8 K segments shared among all processes (kept in global descriptor table (GDT))

Example: The Intel IA-32 Architecture (Cont. ) • CPU generates logical address – Selector

Example: The Intel IA-32 Architecture (Cont. ) • CPU generates logical address – Selector given to segmentation unit • Which produces linear addresses – Linear address given to paging unit • Which generates physical address in main memory • Paging units form equivalent of MMU • Pages sizes can be 4 KB or 4 MB

Logical to Physical Address Translation in IA-32 36

Logical to Physical Address Translation in IA-32 36

Intel IA-32 Segmentation 37

Intel IA-32 Segmentation 37

Intel IA-32 Paging Architecture 38

Intel IA-32 Paging Architecture 38

n 39 Intel IA-32 Page Address Extensions 32 -bit address limits led Intel to

n 39 Intel IA-32 Page Address Extensions 32 -bit address limits led Intel to create page address extension (PAE), allowing 32 -bit apps access to more than 4 GB of memory space l Paging went to a 3 -level scheme l Top two bits refer to a page directory pointer table l Page-directory and page-table entries moved to 64 -bits in size l Net effect is increasing address space to 36 bits – 64 GB of physical memory

Intel x 86 -64 n Current generation Intel x 86 architecture n 64 bits

Intel x 86 -64 n Current generation Intel x 86 architecture n 64 bits is ginormous (> 16 exabytes) n In practice only implement 48 bit addressing n 40 l Page sizes of 4 KB, 2 MB, 1 GB l Four levels of paging hierarchy Can also use PAE so virtual addresses are 48 bits and physical addresses are 52 bits

Example: ARM Architecture 41 n Dominant mobile platform chip (Apple i. OS and Google

Example: ARM Architecture 41 n Dominant mobile platform chip (Apple i. OS and Google Android devices for example) n Modern, energy efficient, 32 -bit CPU n 4 KB and 16 KB pages n 1 MB and 16 MB pages (termed sections) n One-level paging for sections, twolevel for smaller pages n Two levels of TLBs l Outer level has two micro TLBs (one data, one instruction) l Inner is single main TLB l First inner is checked, on miss outers are checked, and on miss page table walk performed by CPU