Windows Internals David Solomon davessolsem com David Solomon

  • Slides: 192
Download presentation
Windows Internals David Solomon (daves@solsem. com) David Solomon Expert Seminars www. solsem. com Mark

Windows Internals David Solomon (daves@solsem. com) David Solomon Expert Seminars www. solsem. com Mark Russinovich (mark@sysinternals. com) Winternals www. winternals. com, www. sysinternals. com

About the Speaker: David Solomon 1982 -1992: VMS operating systems development at Digital 1992

About the Speaker: David Solomon 1982 -1992: VMS operating systems development at Digital 1992 -present: Researching, writing, and teaching Windows operating system internals Frequent speaker at technical conferences (Microsoft Tech. Ed, IT Forum, PDCs, …) Microsoft Most Valuable Professional (1993, 2005) Books Windows Internals, 4 th edition PDF version ships with Server 2003 Resource Kit Inside Windows 2000, 3 rd edition Inside Windows NT, 2 nd edition Windows NT for Open. VMS Professionals Live Classes 2 -5 day classes on Windows Internals, Advanced Troubleshooting Video Training 12 hour interactive internals tutorial Licensed by MS for internal use 2

About the Speaker: Mark Russinovich Co-author of Inside Windows 2000, 3 rd Edition and

About the Speaker: Mark Russinovich Co-author of Inside Windows 2000, 3 rd Edition and Windows Internals, 4 th edition with David Solomon Senior Contributing Editor to Windows IT Pro Magazine Co-authors Windows Power Tools column Author of tools on www. sysinternals. com Microsoft Most Valuable Professional (MVP) Co-founder and chief software architect of Winternals Software (www. winternals. com) Ph. D. in Computer Engineering 3

Acknowledgements Special thanks to: Dave Cutler for initially granting David access to the source

Acknowledgements Special thanks to: Dave Cutler for initially granting David access to the source code in 1993 and reviewing the book and presentations Rob Short & Jim Allchin for continuing to be our “executive sponsors” Also thanks to many others in the Windows team (past & present) for their support and assistance: Landy Wang, Neil Clift, Jim Allchin, Mark Lucovsky, Brian Andrews, Richard Ward, Steve Wood, Tom Miller, Gary Kimura, Darryl Havens, Lou Perazzoli 4

Purpose of Tutorial Give Windows developers a foundation understanding of the system’s kernel architecture

Purpose of Tutorial Give Windows developers a foundation understanding of the system’s kernel architecture Design better for performance & scalability Debug problems more effectively Understand system performance issues We’re covering a small, but important set of core topics: The “plumbing in the boiler room” 5

System Architecture System Processes Service Control Mgr. LSASS Win. Logon User Mode Session Manager

System Architecture System Processes Service Control Mgr. LSASS Win. Logon User Mode Session Manager Services Environment Subsystems Applications Svc. Host. Exe Win. Mgt. Exe Spool. Sv. Exe POSIX Task Manager Explorer User Application Services. Exe Subsystem DLLs OS/2 Windows NTDLL. DLL System Threads Kernel Mode System Service Dispatcher (kernel mode callable interfaces) Local Procedure Call Configuration Mgr (registry) Processes & Threads Virtual Memory Security Reference Monitor Power Mgr. Object Mgr. File System Cache Device & File Sys. Drivers Plug and Play Mgr. I/O Mgr Windows USER, GDI Graphics Drivers Kernel Hardware Abstraction Layer (HAL) hardware interfaces (buses, I/O devices, interrupts, interval timers, DMA, memory cache control, etc. ) 6

Tools Used To Dig In Many tools available to dig into Windows OS internals

Tools Used To Dig In Many tools available to dig into Windows OS internals without requiring source code Helps to see internals behavior “in action” Many of these tools are used in labs in the video and the book Several sources of tools Support Tools (on Windows OS CD-ROM in supporttools) Resource Kit Tools Sysinternals tools (www. sysinternals. com) Windows Debugging Tools 7

Live Kernel Debugging Useful for investigating internal system state not available from other tools

Live Kernel Debugging Useful for investigating internal system state not available from other tools Previously, required 2 computers (host and target) Target would be halted while host debugger in use XP & later supports live local kernel debugging Technically requires system to be booted /DEBUG to work correctly But, not all commands work 8

Live. KD Live. Kd makes more commands work on a live system Works on

Live. KD Live. Kd makes more commands work on a live system Works on NT 4, Windows 2000, Windows XP, Server 2003, and Vista Was originally shipped on Inside Windows 2000 book CD-ROM – now is free on Sysinternals Tricks standard Microsoft kernel debuggers into thinking they are looking at a crash dump Does not guarantee consistent view of system memory Thus can loop or fail with access violation Just quit and restart 9

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals Security Internals 10

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 11

Processes And Threads What is a process? Represents an instance of a running program

Processes And Threads What is a process? Represents an instance of a running program You create a process to run a program Starting an application creates a process Process defined by Per-process address space Thread Address space Resources (e. g. , open handles) Security profile (token) System call Primary argument to Create. Process is image file name (or command line) System-wide address space 12

Processes And Threads What is a thread? An execution context within a process Unit

Processes And Threads What is a thread? An execution context within a process Unit of scheduling (threads run, processes don’t run) All threads in a process share the same per-process address space Per-process address space Thread Services provided so that threads can synchronize access to shared resources (critical sections, mutexes, events, semaphores) All threads in the system are scheduled as peers to all others, without regard to their “parent” process Thread System call: Primary argument to Create. Thread is a function entry point address Linux: No threads per-se Tasks can act like Windows threads by sharing handle table, PID and address space System-wide address space 13

Processes And Threads Every process starts with one thread First thread executes the program’s

Processes And Threads Every process starts with one thread First thread executes the program’s “main” function Can create other threads in the same process Can create additional processes Why divide an application into multiple threads? Perceived user responsiveness, parallel/background execution Examples: Word background print – can continue to edit during print Take advantage of multiple processors On an MP system with n CPUs, n threads can literally run at the same time Question: Given a single threaded application, will adding a second processor make it run faster? Does add complexity Synchronization Scalability well is a different question… Number of multiple runnable threads versus number CPUs Having too many runnable threads causes excess context switching 14

32 -bit x 86 Address Space 32 -bits = 4 GB Default 2 GB

32 -bit x 86 Address Space 32 -bits = 4 GB Default 2 GB User process space 2 GB System Space 3 GB user space 3 GB User process space 1 GB System Space 15

64 -bit Address Spaces 64 -bits = 17, 179, 869, 184 GB x 64

64 -bit Address Spaces 64 -bits = 17, 179, 869, 184 GB x 64 today supports 48 bits virtual = 262, 144 GB IA-64 today support 50 bits virtual = 1, 048, 576 GB x 64 Itanium 8192 GB (8 TB) User process space 7152 GB (7 TB) User process space 6657 GB System Space 6144 GB System Space 16

Memory Protection Model No user process can touch another user process address space (without

Memory Protection Model No user process can touch another user process address space (without first opening a handle to the process, which means passing through NT security) Separate process page tables prevent this “Current” page table changed on context switch from a thread in 1 process to a thread in another process No user process can touch kernel memory Page protection in process page tables prevent this OS pages only accessible from “kernel mode” x 86: Ring 0, Itanium: Privilege Level 0 Threads change from user to kernel mode and back (via a secure interface) to execute kernel code Does not affect scheduling (not a context switch) 17

Process Explorer (Sysinternals) “Super Task Manager” Shows full image path, command line, environment variables,

Process Explorer (Sysinternals) “Super Task Manager” Shows full image path, command line, environment variables, parent process, thread details, security access token, open handles, loaded DLLs & mapped files 18

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 19

Windows Kernel Evolution Basic kernel architecture has remained stable while system has evolved Windows

Windows Kernel Evolution Basic kernel architecture has remained stable while system has evolved Windows 2000: major changes in I/O subsystem (plug & play, power management, WDM), but rest similar to NT 4 Windows XP & Server 2003: modest upgrades as compared to the changes from NT 4 to Windows 2000 Internal version numbers confirm this: Windows 2000 was 5. 0 Windows XP is 5. 1 Windows Server 2003 is 5. 2 Windows Vista is 6. 0 20

Kernel Architecture Is Windows NT/2000/XP/2003 a microkernel-based OS? No – not using the academic

Kernel Architecture Is Windows NT/2000/XP/2003 a microkernel-based OS? No – not using the academic definition (OS components and drivers run in their own private address spaces, layered on a primitive microkernel) All kernel components live in a common shared address space Therefore no protection between OS and drivers But it does have some attributes of a microkernel OS OS personalities running in user space as separate processes Kernel-mode components don't reach into one another’s data structures Use formal interfaces to pass parameters and access and/or modify data structures Therefore the term “modified microkernel” Why not pure microkernel? Performance – separate address spaces would mean context switching to call basic OS services Linux has the same monolithic kernel architecture So do most Unix’s, VMS, … 21

Example Invoking a Win 32 Kernel API Windows application Write. File in Kernel 32.

Example Invoking a Win 32 Kernel API Windows application Write. File in Kernel 32. Dll Nt. Write. File in Nt. Dll call Write. File(…) call Nt. Write. File return to caller Int 2 E or SYSENTER or SYSCALL return to caller software interrupt Win 32 specific used by all subsystems U K Ki. System. Service in Ntos. Krnl. Exe call Nt. Write. File dismiss interrupt Nt. Write. File in Ntos. Krnl. Exe do the operation return to caller 22

API Differences Windows DLLs versus Nt. Dll Windows “kernel” APIs exported by Kernel 32.

API Differences Windows DLLs versus Nt. Dll Windows “kernel” APIs exported by Kernel 32. Dll are different from the “native API” in Nt. Dll Different entry point names Arguments are different (but similar) Routines in Kernel 32. Dll rearrange (“marshal”) the arguments and call routines in Nt. Dll uses change mode mechanism (INT 2 E, SYSCALL) to invoke services in Ntos. Krnl. Exe in kernel mode Nt. Dll versus Ntos. Krnl. Exe 1400 exported symbols (285 start with “Nt”) Entry point names, arguments, etc. , are the same between Nt. Dll and Ntos. Krnl. Exe I. e. , a user-mode routine in the native API can also be called from kernel mode The DDK describes many “Zw” routines such as Zw. Read. File, callable from kernel mode – this is the same location in memory as Nt. Read. File from user mode Kernel mode could also call Nt. Read. File directly 23

Symmetric Multiprocessing (SMP) No master processor All the processors share just one memory space

Symmetric Multiprocessing (SMP) No master processor All the processors share just one memory space Interrupts can be serviced on any processor Any processor can cause another processor to reschedule what it’s running CPUs L 2 -Cache Memory I/O Windows Server 2003 supports NUMA (non uniform memory architecture) systems 24

New MP Configurations Hyperthreading support CPU fools OS into thinking there are multiple CPUs

New MP Configurations Hyperthreading support CPU fools OS into thinking there are multiple CPUs Example: dual Xeon with hyperthreading can support 2 logical processors XP & Windows Server 2003 are hyperthreading aware Logical processors don’t count against physical CPU limits E. g. XP Home will use 2 logical processors; XP Pro will use 4 Scheduling algorithms take into account logical vs physical processors Dual Core Processor licensing is per-socket NUMA (non uniform memory architecture) Groups of physical processors (called “nodes”) that have “local memory” Still an SMP system (e. g. any processor can access all of memory) But node-local memory is faster Scheduling algorithms take this into account 25

Kernel Synchronization Kernel synchronization primitives Spinlocks Queued Spinlocks Pushlocks Executive Resources Fast Mutexes, Guarded

Kernel Synchronization Kernel synchronization primitives Spinlocks Queued Spinlocks Pushlocks Executive Resources Fast Mutexes, Guarded Mutexes Kernel Dispatcher Mutexes & Semaphores Scalability improvements Elimination of locks Locks held shorter durations Scheduling database now per-CPU 26

Increased System Memory Limits Key system memory limits raised in XP and 2003 Windows

Increased System Memory Limits Key system memory limits raised in XP and 2003 Windows 2000 limit of 200 GB of mapped file data eliminated Previously limited size of files that could be backed up Variable system PTEs can now describe 1. 3 GB (960 MB contiguous) Windows 2000 limit was 660 MB (220 MB contiguous) Max device driver size was 220 MB, now 960 MB Registry limit of 376 MB removed Was a limit on number of terminal server users No longer in paged pool – now a memory-mapped file No registry quota any more SYSTEM hive limited to 200 MB or ¼ of RAM, whichever is lower (max was 12 MB) 27

Increased Limits in 64 -bit Windows User Address Space Page file limit IA 64

Increased Limits in 64 -bit Windows User Address Space Page file limit IA 64 7152 GB 16 TB x 64 8192 GB 16 TB Max page file space System PTE Space System Cache Paged pool Non-paged pool 256 TB 128 GB 128 GB x 86 2 -3 GB 4095 MB PAE: 16 TB ~64 GB 1. 2 GB 960 MB 470 -650 MB 256 MB 28

Many Packages… 1. Windows XP Home Edition Licensed for 1 CPU die, 4 GB

Many Packages… 1. Windows XP Home Edition Licensed for 1 CPU die, 4 GB RAM 2. Windows 2000 & XP Professional Desktop version (but also is a fully functional server system) Licensed for 2 CPU dies, 4 GB RAM (128 GB for 64 -bit edition on x 64) 3. Windows Server 2003, Web Server Reduced functionality Standard Server (no domain controller) Licensed for 2 CPU dies, 2 GB RAM 4. Windows Server 2003, Standard Edition (formerly Windows 2000 Server) Adds server and networking features (active directory-based domains, host-based mirroring and RAID 5, Net. Ware gateway, DHCP server, WINS, DNS, …) Licensed for 4 CPU dies, 4 GB RAM (128 GB on x 64) 5. Windows Server 2003, Enterprise Edition (formerly Windows 2000 Advanced Server ) 3 GB per-process address space option, Clusters (8 nodes) 32 -bit: 8 CPU dies, 32 GB RAM; 64 -bit: 64 GB 6. Windows 2000 Datacenter Server & Windows 2003 Server, Datacenter Edition 32 -bit: 32 processors, 64 GB RAM; 64 -bit: 64 processors & 1024 GB RAM NOTE: this is not an exhaustive list XP: Tablet PC edition, Media Center Edition, Starter Edition, N Edition Server: Small Business Server, Storage Server, … 29

. . . One OS Kernel Windows XP & 2003 for x 64 (5.

. . . One OS Kernel Windows XP & 2003 for x 64 (5. 2) and all Windows 2000 versions have identical core operating system executables NTOSKRNL. EXE, HAL. DLL, xxx. DRIVER. SYS, etc. XP & Server 2003 have different kernel versions (5. 1 vs 5. 2) Registry indicates system type (set at install time) HKEY_LOCAL_MACHINESystemCurrent. Control. SetControl Product. Options Product. Type: Win. NT=Workstation, Server. NT=Server not a domain controller, Lan. Man. NT=Server that is a Domain Controller Product. Suite: indicates type of Server (Advanced, Datacenter, or for Windows NT 4. 0: Enterprise Edition, Terminal Server, …) Code in the operating system tests these values and behaves slightly differently in a few places Licensing limits (number of processors, number of inbound network connections, etc. ) Boot-time calculations (mostly in the memory manager) Default length of time slice 30

NTOSKRNL. EXE Core operating system image Contains Executive and Kernel Four retail variations: NTOSKRNL.

NTOSKRNL. EXE Core operating system image Contains Executive and Kernel Four retail variations: NTOSKRNL. EXE NTKRNLMP. EXE Uniprocessor Multiprocessor 32 -bit Windows PAE versions (for DEP & >4 GB RAM): NTKRNLPA. EXE addressing support NTKRPAMP. EXE addressing support Uniprocessor w/extended Multiprocessor w/extended Vista: no uniprocessor kernel 31

Debug Version “Checked Build” Special debug version of system called “Checked Build” Provided with

Debug Version “Checked Build” Special debug version of system called “Checked Build” Provided with MSDN Primarily for driver testing, but can be useful for catching timing bugs in multithreaded applications Built from same source files as “free build” (a. k. a. , “retail build”) “DBG” compile-time symbol defined which enables: Error tests for “can’t happen” conditions in kernel mode (ASSERTs) Validity checks on arguments passed from one kernel mode routine to another #ifdef DBG if (something that should never happen has happened) Ke. Bug. Check. Ex(…) #endif Multiprocessor kernel (of course, runs on UP systems) Can capture kernel debugger output with Dbgview from Sysinternals. com See Knowledge base article 314743 (HOWTO: Enable Verbose Debug Tracing in Various Drivers and Subsystems) 32

System Architecture System Processes Service Control Mgr. Svc. Host. Exe Win. Mgt. Exe Spool.

System Architecture System Processes Service Control Mgr. Svc. Host. Exe Win. Mgt. Exe Spool. Sv. Exe LSASS Win. Logon User Mode Session Manager Environment Subsystems Applications POSIX Task Manager Explorer User Application Services. Exe OS/2 Windows Subsystem DLLs NTDLL. DLL System Threads Kernel Mode System Service Dispatcher (kernel mode callable interfaces) Local Procedure Call Configuration Mgr (registry) Processes & Threads Virtual Memory Security Reference Monitor Power Mgr. Object Mgr. File System Cache Device & File Sys. Drivers Plug and Play Mgr. I/O Mgr Windows USER, GDI Graphics Drivers Kernel Hardware Abstraction Layer (HAL) hardware interfaces (buses, I/O devices, interrupts, interval timers, DMA, memory cache control, etc. ) Original copyright by Microsoft Corporation. Used by permission. 33

Executive Upper layer of the operating system Provides “generic operating system” functions (“services”) Process

Executive Upper layer of the operating system Provides “generic operating system” functions (“services”) Process Manager Object Manager Cache Manager LPC (local procedure call) Facility Configuration Manager Memory Manager Security Reference Monitor I/O Manager Power Manager Plug-and-Play Manager Almost completely portable C code Runs in kernel (“privileged”, ring 0) mode Most interfaces to executive services not documented 34

Kernel Lower layers of the operating system Implements processor-dependent functions (x 86 versus Itanium,

Kernel Lower layers of the operating system Implements processor-dependent functions (x 86 versus Itanium, etc. ) Also implements many processor-independent functions that are closely associated with processor-dependent functions Main services Thread waiting, scheduling, and context switching Exception and interrupt dispatching Operating system synchronization primitives (different for MP versus UP) A few of these are exposed to user mode Not a classic “microkernel” shares address space with rest of kernel-mode components 35

HAL – Hardware Abstraction Layer Responsible for a small part of “hardware abstraction” Components

HAL – Hardware Abstraction Layer Responsible for a small part of “hardware abstraction” Components on the motherboard not handled by drivers System timers, Cache coherency, and flushing SMP support, Hardware interrupt priorities Subroutine library for the kernel and device drivers Isolates OS & drivers from platform-specific details Presents uniform model of I/O hardware interface to drivers Reduced role in Windows 2000 Bus support moved to bus drivers Majority of HALs are vendor-independent 36

Digging Into NTOSKRNL. EXE Exported symbols Functions and global variables Microsoft wants visible outside

Digging Into NTOSKRNL. EXE Exported symbols Functions and global variables Microsoft wants visible outside the image (e. g. , used by device drivers) About 1500 symbols exported, of which about 400 are documented in the DDK Ways to list: Dependency Walker (File->Save As) Visual C++ “link /dump /exports ntoskrnl. exe” Global symbols Over 9000 global symbols in XP/2003 (Windows NT 4. 0 was 4700) Many variables contain values related to performance and memory policies Ways to list: Visual C++: “dumpbin /symbols /all ntoskrnl. exe” (names only) Kernel debugger: “x nt!*” Module name of NTOSKRNL is “NT” 37

Naming Convention For Internal NTOSKRNL Routines Two- or three-letter component code in beginning of

Naming Convention For Internal NTOSKRNL Routines Two- or three-letter component code in beginning of function name Executive Ex Exp Cc Mm Rtl Fs. Rtl - General executive routine - Executive private (not exported) - Cache manager Se - Memory management - Run-Time Library Lsa - File System Run-Time Lib Ob - Object management Io - I/O subsystem - Security Ps - Process structure - Security Authentication Zw - File access, etc. Kernel Ke Ki - Kernel internal (not available outside the kernel) HAL Hal - Hardware Abstraction Layer READ_, WRITE_ - I/O port and register access 38

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 39

Interrupt Dispatching user or kernel mode code interrupt ! kernel mode Note, no thread

Interrupt Dispatching user or kernel mode code interrupt ! kernel mode Note, no thread or process context switch! Interrupt dispatch routine Disable interrupts Interrupt service routine Record machine state (trap frame) to allow resume Mask equal- and lower-IRQL interrupts Find and call appropriate ISR Tell the device to stop interrupting Interrogate device state, start next operation on device, etc. Request a DPC Return to caller Dismiss interrupt Restore machine state (including mode and enabled interrupts) 40

Interrupt Precedence Via IRQLs IRQL = Interrupt Request Level The “precedence” of the interrupt

Interrupt Precedence Via IRQLs IRQL = Interrupt Request Level The “precedence” of the interrupt with respect to other interrupts Different interrupt sources have different IRQLs Not the same as IRQ 31 30 29 28 2 1 0 High Power fail Interprocessor Interrupt Clock Device n. . . Device 1 Dispatch/DPC APC Passive IRQL is also a state of the processor Servicing an interrupt raises processor IRQL to that interrupt’s IRQL This masks subsequent interrupts at equal and lower IRQLs User mode is limited to IRQL 0 No waits or page faults at IRQL >= DISPATCH_LEVEL Hardware interrupts Deferrable software interrupts normal thread execution 41

Deferred Procedure Calls (DPCs) Used to defer processing from higher (device) interrupt level to

Deferred Procedure Calls (DPCs) Used to defer processing from higher (device) interrupt level to a lower (dispatch) level Driver (usually ISR) queues request One queue per CPU; DPCs are normally queued to the current processor, but can be targetted to other CPUs Executes specified procedure at dispatch IRQL (or “dispatch level”, also “DPC level”) when all higher-IRQL work (interrupts) completed Used heavily for driver “after interrupt” functions Also used for quantum end and timer expiration queue head DPC object 42

IRQLs on 64 -bit Systems x 64 15 14 13 12 4 3 2

IRQLs on 64 -bit Systems x 64 15 14 13 12 4 3 2 1 0 IA 64 High/Profile/Power Interprocessor Interrupt Clock Synch (Srv 2003) Device n. . Device 1 Dispatch/DPC APC Passive/Low Clock Synch (MP only) Device n. Device 1 Correctable Machine Check Dispatch/DPC & Synch (UP only) APC Passive/Low 43

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 44

Object Manager Executive component for managing system-defined “objects” Objects are data structures with optional

Object Manager Executive component for managing system-defined “objects” Objects are data structures with optional names “Objects” managed here include Windows Kernel objects, but not Windows User or GDI objects Object manager implements user-mode handles and the process handle table Object manager is not used for all OS data structures Generally, only those types that need to be shared, named, or exported to user mode Some data structures are called “objects” but are not managed by the object manager (e. g. , “DPC objects”) 45

Object Manager In part, a heap manager… Allocates memory for data structure from system-wide,

Object Manager In part, a heap manager… Allocates memory for data structure from system-wide, kernel space heaps (pageable or nonpageable) …With a few extra functions Assigns name to data structure (optional) Allows lookup by name Objects can be protected by ACL-based security Provides uniform naming, sharing, and protection scheme Simplifies C 2 security certification by centralizing all object protection in one place Maintains counts of handles and references (stored pointers in kernel space) to each object Object cannot be freed back to the heap until all handles and references are gone 46

Handles And Security Process handle table Is unique for each process But is in

Handles And Security Process handle table Is unique for each process But is in system address space, hence cannot be modified from user mode Hence, is trusted Security checks are made when handle table entry is created i. e. at Create. Xxx time Handle table entry indicates the “validated” access rights to the object Read, Write, Delete, Terminate, etc. No need to revalidate on each request 47

Examining Handles: MS Tools Two tools: XP & 2003: openfiles /query command Resource Kit

Examining Handles: MS Tools Two tools: XP & 2003: openfiles /query command Resource Kit “oh” (Open Handles) tool Both of these require a special NT “global flag” registry bit to be set Requires reboot to take effect See HKEY_LOCAL_MACHINESystemCurrent. Control. Set ControlSession ManagerGlobal. Flag Can view this bitmask with the GFLAGS tool Uses 8 bytes extra for each open handle 48

Examining Open Handles: Sysinternals Tools Process Explorer (GUI version) or Handle (character cell version)

Examining Open Handles: Sysinternals Tools Process Explorer (GUI version) or Handle (character cell version) from www. sysinternals. com Uses a device driver to walk handle table, so doesn’t need Global Flag set 49

Viewing Open Handles Handle View By default, shows named objects Click on Options->Show Unnamed

Viewing Open Handles Handle View By default, shows named objects Click on Options->Show Unnamed Objects Uses: Solve file locked errors Can search to determine what process is holding a file or directory open Can even close an open files (be careful!) Understand resources used by an application Detect handle leaks using refresh difference highlighting View the state of synchronization objects (mutexes, semaphores, events) 50

Viewing Handles With Kernel Debugger If looking at a dump, use !handle in Kernel

Viewing Handles With Kernel Debugger If looking at a dump, use !handle in Kernel Debugger (see help for options) lkd> !handle 0 f 9 e 8 file Searching for Process with Cid == 9 e 8 Searching for handles of type file PROCESS 82 ce 72 d 0 Session. Id: 0 Cid: 09 e 8 Peb: 7 ffdf 000 Parent. Cid: 06 e Dir. Base: 06602000 Object. Table: e 1 c 879 c 8 Handle. Count: 430. Image: POWERPNT. EXE … 0280: Object: 82 c 5 e 230 Granted. Access: 00120089 Object: 82 c 5 e 230 Type: (82 fdde 70) File Object. Header: 82 c 5 e 218 Handle. Count: 1 Pointer. Count: 1 Directory Object: 0000 Name: slidesntintnew4 -systemarchitecture. ppt {Harddisk. Volume 1} 51

Object Manager Namespace System and session-wide internal namespace View with Winobj from www. sysinternals.

Object Manager Namespace System and session-wide internal namespace View with Winobj from www. sysinternals. com 52

Object Manager Namespace Hierarchical directory structure (based on file system model) System-wide (not per-process)

Object Manager Namespace Hierarchical directory structure (based on file system model) System-wide (not per-process) With Terminal Services, Windows objects are per-session by default Vista: console no longer is session 0 Can override this with “global” prefix on object names Volatile (not preserved across boots) Namespace can be extended by secondary object managers (e. g. , file system) Hook mechanism to call external parse routine (method) Supports case sensitive or case blind Supports symbolic links (used to implement drive letters, etc. ) Lookup done on object creation or access by name Not on access by handle Not all objects managed by the object manager are named E. g. , file objects are not named Un-named objects are not visible in Win. Obj 53

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 54

System Threads Functions in OS and some drivers that need to run as real

System Threads Functions in OS and some drivers that need to run as real threads E. g. , need to run concurrently with other system activity, wait on timers, perform background “housekeeping” work Always run in kernel mode Not non-preemptible (unless they raise IRQL to 2 or above) For details, see DDK documentation on Ps. Create. System. Thread What process do they appear in? “System” process (Windows NT 4. 0: PID 2, Windows 2000: PID 8, Windows XP: PID 4) In Windows 2000 and later, windowing system threads (from Win 32 k. sys) appear in “csrss. exe” (Windows subsystem process) 55

Examples Of System Threads Memory Manager Modified Page Writer for mapped files Modified Page

Examples Of System Threads Memory Manager Modified Page Writer for mapped files Modified Page Writer for paging files Balance Set Manager Swapper (kernel stack, working sets) Zero page thread (thread 0, priority 0) Security Reference Monitor Command Server Thread Network Redirector and Server Worker Threads created by drivers for their exclusive use Examples: Floppy driver, parallel port driver Pool of Executive Worker Threads Used by drivers, file systems, … Accessed via Ex. Queue. Work. Item 56

Identifying System Threads If System threads are consuming CPU time, need to find out

Identifying System Threads If System threads are consuming CPU time, need to find out what code is running, since it could be any one of a variety of components Pieces of OS (Ntoskrnl. exe) File server worker threads (Srv. sys) Other drivers To really understand what’s going on, must find which driver a thread “belongs to” 57

Identifiying System Threads Process Explorer: Double click on System process Go to Threads tab

Identifiying System Threads Process Explorer: Double click on System process Go to Threads tab and sort by CPU To view call stack, must use kernel debugger Note: several threads run between clock ticks (or at high IRQL) and thus don’t appear to run Watch context switch count 58

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 59

Process-Based Code OS components that run in separate executables (. exes), in their own

Process-Based Code OS components that run in separate executables (. exes), in their own processes Started by system Not tied to a user logon Three types Environment subsystems (already described) System startup processes Note: “system startup processes” is not an official Microsoft defined name Windows Services Let’s examine the system process “tree” Use Tlist /T or Process Explorer 60

Process-Based NT Code System Startup Processes First two processes aren’t real processes Not running

Process-Based NT Code System Startup Processes First two processes aren’t real processes Not running a user mode. EXE No user-mode address space Different utilities report them with different names Data structures for these processes (and their initial threads) are “pre-created” in Ntos. Krnl. Exe and loaded along with the code (Idle) Process id 0 Part of the loaded system image Home for idle thread(s) (not a real process nor real threads) Called “System Process” in many displays (System) Process id 2 (8 in Windows 2000; 4 in XP) Part of the loaded system image Home for kernel-defined threads (not a real process) Thread 0 (routine name Phase 1 Initialization) launches the first “real” process, running smss. exe. . . and then becomes the zero page thread 61

Process-Based NT Code System Startup Processes smss. exe csrss. exe winlogon. exe services. exe

Process-Based NT Code System Startup Processes smss. exe csrss. exe winlogon. exe services. exe lsass. exe userinit. exe explorer. exe Session Manager The first “created” process Takes parameters from HKEY_LOCAL_MACHINESystemCurrent. Control. Set ControlSession Manager Launches required subsystems (csrss) and then winlogon Windows subsystem Logon process: Launches services. exe & lsass. exe; presents first login prompt When someone logs in, launches apps in SoftwareMicrosoftWindows NTWin. LogonUserinit Service Controller; also, home for many NT-supplied services Starts processes for services not part of services. exe (driven by RegistryMachineSystemCurrent. Control. SetServices ) Local Security Authentication Server Started after logon; starts Explorer. exe (see SoftwareMicrosoftWindows NTCurrent. VersionWin. LogonShell) and exits (hence Explorer appears to be an orphan) and its children are the creators of all interactive apps 62

Logon Process Winlogon sends username/password to Lsass Either on local system for local logon,

Logon Process Winlogon sends username/password to Lsass Either on local system for local logon, or to Netlogon service on a domain Windows XP enhancement: Winlogon doesn’t wait for Workstation service to start if Account doesn't depend on a roaming profile Domain policy that affects logon hasn't changed since last logon Controller for a network logon Creates a process to run HKLMSoftwareMicrosoftWindows NT Current. VersionWin. LogonUserinit By default: Userinit. exe Runs logon script, restores drive-letter mappings, starts shell Userinit creates a process to run HKLMSoftwareMicrosoftWindows NT Current. VersionWin. LogonShell By default: Explorer. exe There are other places in the Registry that control programs that start at logon 63

Processes Started at Logon Displays order of processes configured to start at log on

Processes Started at Logon Displays order of processes configured to start at log on time Also can use new XP built-in tool called “System Configuration Utility” To run, click on Start->Help, then “Use Tools…”, then System Configuration Utility Only shows what’s defined to start vs Autoruns which shows all places things CAN be defined to start Autoruns (Sysinternals) Msconfig (in Windowspchealthhelpctrbinaries) 64

Windows Services An overloaded generic term A process created and managed by the Service

Windows Services An overloaded generic term A process created and managed by the Service Control Manager (Services. exe) E. g. Solitaire can be configured as a service, but is killed shortly after starting Similar in concept to Unix daemon processes Typically configured to start at boot time (if started while logged on, survive logoff) Typically do not interact with the desktop Note: Prior to Windows 2000 this is one way to start a process on a remote machine (now you can do it with WMI) 65

Life Of A Service Install time Setup application tells Service Controller about the service

Life Of A Service Install time Setup application tells Service Controller about the service Setup Application Registry Create. Service System boot/initialization SCM reads registry, starts services as directed Management/maintenance Control panel can start and stop services and change startup parameters Service Controller/ Manager (Services. Exe ) Service Processes Control Panel 66

Viewing Service Processes Process Explorer can highlight Service Processes Click on Options->Highlight Services 67

Viewing Service Processes Process Explorer can highlight Service Processes Click on Options->Highlight Services 67

Svchost Mechanism Windows 2000 introduced generic Svchost. exe Groups services into fewer processes Improves

Svchost Mechanism Windows 2000 introduced generic Svchost. exe Groups services into fewer processes Improves system startup time Conserves system virtual memory Not user-configurable as to which services go in which processes 3 rd parties cannot add services to Svchost. exe processes Windows XP/2003 have more Svchost processes due to two new less privileged accounts for built-in services LOCAL SERVICE, NETWORK SERVICE Less rights than SYSTEM account Reduces possibility of damage if system compromised On XP/2003, four Svchost processes (at least): SYSTEM, SYSTEM (2 nd instance – for RPC), LOCAL SERVICE, NETWORK SERVICE 68

Mapping Services To Service Processes Tlist /S (Debugging Tools) or Tasklist /svc (XP/2003) list

Mapping Services To Service Processes Tlist /S (Debugging Tools) or Tasklist /svc (XP/2003) list internal name of services inside service processes Process Explorer shows more: external display name and description 69

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based

System Architecture Process Execution Environment Kernel Architecture Interrupt Handling Object Manager System Threads Process-based code Summary 70

Four Contexts For Executing Code Full process and thread context User applications Windows Services

Four Contexts For Executing Code Full process and thread context User applications Windows Services Environment subsystem processes System startup processes Have thread context but no “real” process Threads in “System” process Routines called by other threads/processes Subsystem DLLs Executive system services (Nt. Read. File, etc. ) GDI 32 and User 32 APIs implemented in Win 32 K. Sys (and graphics drivers) No process or thread context (“arbitrary thread context”) Interrupt dispatching Device drivers 71

System Architecture System Processes Service Control Mgr. LSASS Win. Logon User Mode Session Manager

System Architecture System Processes Service Control Mgr. LSASS Win. Logon User Mode Session Manager Services Environment Subsystems Applications Svc. Host. Exe Win. Mgt. Exe Spool. Sv. Exe POSIX Task Manager Explorer User Application Services. Exe OS/2 Windows Subsystem DLLs NTDLL. DLL System Threads Kernel Mode System Service Dispatcher (kernel mode callable interfaces) Local Procedure Call Configuration Mgr (registry) Processes & Threads Virtual Memory Security Reference Monitor Power Mgr. Object Mgr. File System Cache Device & File Sys. Drivers Plug and Play Mgr. I/O Mgr Windows USER, GDI Graphics Drivers Kernel Hardware Abstraction Layer (HAL) hardware interfaces (buses, I/O devices, interrupts, interval timers, DMA, memory cache control, etc. ) Original copyright by Microsoft Corporation. Used by permission. 72

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals Security Internals 73

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 74

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 74

Processes And Threads Each process has its own… Virtual address space (including program global

Processes And Threads Each process has its own… Virtual address space (including program global storage, heap storage, threads’ stacks) Processes cannot corrupt each other’s address space by mistake Working set (physical memory “owned” by the process) Access token (includes security identifiers) Handle table for Windows kernel objects These are common to all threads in the process, but separate and protected between processes Each thread has its own… User-mode stack (automatic storage, call frames, etc. ) Kernel-mode stack Scheduling state (Wait, Ready, Running, etc. ) and priority Current access mode (user mode or kernel mode) Saved CPU state if it isn’t running Access token (optional – overrides process’s if present) 75

Process And Thread Identifiers Every process and every thread has an identifier Generically: “client

Process And Thread Identifiers Every process and every thread has an identifier Generically: “client ID” (debugger shows as “CID”) A. k. a. , “process ID” and “thread ID”, respectively Process IDs and thread IDs are in the same “number space” These identify the requesting process or thread to its subsystem “server” process, in API calls that need the server’s help Visible in Perf. Mon, Task Manager (for processes), Process Viewer (for processes), kernel debugger, etc. IDs are unique among all existing processes and threads But might be reused as soon as a process or thread is deleted 76

Jobs Processes Job Kernel object to manage groups of processes Set limits on a

Jobs Processes Job Kernel object to manage groups of processes Set limits on a process or group of processes Quotas and restrictions: Quotas: total CPU time, # active processes, per-process CPU time, memory usage Run-time restrictions: priority of all the processes in job; processors threads in job can run on Security restrictions: limits what processes can do Not acquire administrative privileges Not accessing windows outside the job, no reading/writing the clipboard Scheduling class: number from 0 -9 (5 is default) - affects length of thread timeslice (or quantum - t. b. d. ) E. g. can be used to achieve “class scheduling” (partition CPU) 77

Jobs How do processes become of a job? Job object has to be created

Jobs How do processes become of a job? Job object has to be created Then processes are explicitly added Processes by processes in a job automatically are part of the job Unless restricted, processes can “break away” from a job Only Datacenter Server has a built-in tool to take advantage of jobs “Process Control Manager” – allows creating definitions for jobs and associating processes with them Uses of jobs in OS: Add/Remove Programs (“ARP Job”) WMI provider RUNAS service (Sec. Logon) uses jobs to terminate processes at log out SU from NT 4 Res. Kit didn’t do this 78

Demo: WMI Jobs are used by WMI Example: run Psinfo (Sysinternals) and pause output

Demo: WMI Jobs are used by WMI Example: run Psinfo (Sysinternals) and pause output 79

Processes And Threads Internal Data Structures Access Token VAD Process Object VAD Virtual Address

Processes And Threads Internal Data Structures Access Token VAD Process Object VAD Virtual Address Space Descriptors Handle Table See kernel debugger commands: dt (see next slide) !process !thread !token !handle !object Thread . . . Access Token 80

Dumping Structures With Kernel Debugger !process and !thread show subset of information in a

Dumping Structures With Kernel Debugger !process and !thread show subset of information in a process & thread block “dt” (“Display Type”) command can format all the fields Syntax: “dt Structure. Name address –r” dt nt!_* - displays all OS structures known to dt Process/thread-related structures nt!_EPROCESS nt!_ETHREAD 81

Process Block Layout lkd> dt nt!_EPROCESS +0 x 000 Pcb : _KPROCESS +0 x

Process Block Layout lkd> dt nt!_EPROCESS +0 x 000 Pcb : _KPROCESS +0 x 06 c Process. Lock : _EX_PUSH_LOCK +0 x 070 Create. Time : _LARGE_INTEGER +0 x 078 Exit. Time : _LARGE_INTEGER +0 x 080 Rundown. Protect : _EX_RUNDOWN_REF +0 x 084 Unique. Process. Id : Ptr 32 Void +0 x 088 Active. Process. Links : _LIST_ENTRY +0 x 090 Quota. Usage : [3] Uint 4 B +0 x 09 c Quota. Peak : [3] Uint 4 B +0 x 0 a 8 Commit. Charge : Uint 4 B +0 x 0 ac Peak. Virtual. Size : Uint 4 B +0 x 0 b 0 Virtual. Size : Uint 4 B. . NOTE: Add “-r” to recurse through substructures 82

Thread Block (!strct ethread) lkd> dt nt!_ETHREAD +0 x 000 Tcb : _KTHREAD +0

Thread Block (!strct ethread) lkd> dt nt!_ETHREAD +0 x 000 Tcb : _KTHREAD +0 x 1 c 0 Create. Time : _LARGE_INTEGER +0 x 1 c 0 Nested. Fault. Count : Pos 0, 2 Bits +0 x 1 c 0 Apc. Needed : Pos 2, 1 Bit +0 x 1 c 8 Exit. Time : _LARGE_INTEGER +0 x 1 c 8 Lpc. Reply. Chain : _LIST_ENTRY +0 x 1 c 8 Keyed. Wait. Chain : _LIST_ENTRY +0 x 1 d 0 Exit. Status : Int 4 B +0 x 1 d 0 Ofs. Chain : Ptr 32 Void +0 x 1 d 4 Post. Block. List : _LIST_ENTRY +0 x 1 dc Termination. Port : Ptr 32 _TERMINATION_PORT +0 x 1 dc Reaper. Link : Ptr 32 _ETHREAD NOTE: Add “-r” to recurse through substructures 83

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 84

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 84

Scheduling Priorities Realtime Time Critical 31 Realtime Levels 16 -31 Realtime Idle 24 High

Scheduling Priorities Realtime Time Critical 31 Realtime Levels 16 -31 Realtime Idle 24 High 16 15 Above Normal 13 Normal 10 Dynamic Levels 1 -15 Below Normal 8 8 Idle 6 4 Dynamic Idle System Idle 0 85

Thread Scheduling Priority driven, preemptive UP: highest priority thread always runs MP: One of

Thread Scheduling Priority driven, preemptive UP: highest priority thread always runs MP: One of the highest priority runnable thread will be running somewhere Event-driven; no guaranteed execution period before preemption No attempt to share processor(s) “fairly” among processes, only among threads Time-sliced, round-robin within a priority level Order 1 (no scan of all threads) Linux 2. 4 is Order N (2. 6 is O 1) 86

Thread Scheduling The “code that does scheduling” is not a thread i. e. there

Thread Scheduling The “code that does scheduling” is not a thread i. e. there is no always-instantiated routine called “the scheduler” Scheduling routines are called whenever events occur that change the state of a thread interval timer interrupts (for quantum end) interval timer interrupts (for timed wait completion) other hardware interrupts (for I/O wait completion) one thread changes the state of a waitable object upon which other thread(s) are waiting a thread waits on one or more dispatcher objects a thread priority is changed 87

Scheduling Scenarios: Preemption is strictly event-driven does not wait for the next clock tick

Scheduling Scenarios: Preemption is strictly event-driven does not wait for the next clock tick no guaranteed execution period before preemption threads in kernel mode may be preempted (unless they raise IRQL to >= 2) Running Ready from Wait state 18 17 16 15 14 13 A preempted thread goes back to the head of its ready queue also, if in real-time priority range, its quantum is reset 88

Scheduling Scenarios Ready After Wait Resolution If newly-ready thread is not of higher priority

Scheduling Scenarios Ready After Wait Resolution If newly-ready thread is not of higher priority than the running thread… …it is put at the tail of the ready queue for its current priority If in real-time priority range, its quantum is reset Running Ready 18 17 16 15 14 13 from Wait state 89

Scheduling Scenarios Voluntary Switch When the running thread gives up the CPU… …Schedule thread

Scheduling Scenarios Voluntary Switch When the running thread gives up the CPU… …Schedule thread at the head of the next non-empty “ready” queue Running Ready 18 17 16 15 14 13 to Waiting state 90

Scheduling Scenarios Quantum End When the running thread exhausts its CPU quantum, it goes

Scheduling Scenarios Quantum End When the running thread exhausts its CPU quantum, it goes to the end of its ready queue Applies to all threads (even if in kernel mode if IRQL<2) Quantums can be disabled for a thread by a kernel function Default quantum on Professional is 2 clock ticks, 12 on Server standard clock tick is 10 msec; might be 15 msec on some MP Pentium systems If no other ready threads at that priority, same thread continues running (just gets new quantum) If running at boosted priority, priority decays at quantum end (described later) 18 17 16 15 14 13 Running Ready 91

Quantum Stretching Resulting quantum: “Maximum” = 6 ticks (middle) = 4 ticks “None” =

Quantum Stretching Resulting quantum: “Maximum” = 6 ticks (middle) = 4 ticks “None” = 2 ticks Running Ready 8 Quantum stretching does not happen on NT Server Quantum on NT Server is 12 ticks 92

Quantum Selection As of Windows 2000, can choose short quantums on Server (e. g.

Quantum Selection As of Windows 2000, can choose short quantums on Server (e. g. for terminal servers) Windows 2000: Windows XP: 93

Controlling Quantum If a process is a member of a job, quantum can be

Controlling Quantum If a process is a member of a job, quantum can be adjusted by setting the “Scheduling Class” Only applies if process is >Idle priority class Only applies if system running with fixed quantums (the default on Servers) Values are 0 -9 5 is default Scheduling class Quantum units 0 1 6 12 2 3 4 5 6 7 8 9 18 24 30 36 42 48 54 60 94

Thread Scheduling States Init (0) preempt Ready (1) Transition (6) preemption, quantum end Standby

Thread Scheduling States Init (0) preempt Ready (1) Transition (6) preemption, quantum end Standby (3) Running (2) voluntary switch Waiting (5) wait resolved after kernel stack made pageable Terminate (4) Ready = thread eligible to be scheduled to run Standby = thread is selected to run on CPU 95

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 96

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 96

Priority Adjustments Priority boosts are applied to threads in “dynamic” classes (1 -15) No

Priority Adjustments Priority boosts are applied to threads in “dynamic” classes (1 -15) No automatic adjustments in “real-time” class (16 or above) Can disable with Set. Thread. Priority. Boost or Set. Process. Priority. Boost Five types: I/O completion Wait completion on events or semaphores When threads in the foreground process complete a wait When GUI threads wake up for windows input For CPU starvation avoidance 97

Priority Boosting After an I/O: specified by device driver Io. Complete. Request( Irp, Priority.

Priority Boosting After an I/O: specified by device driver Io. Complete. Request( Irp, Priority. Boost ) After a wait on executive event or semaphore Ke. Set. Event( Event, Priority. Boost…) Boost value of 1 is used for these objects Server 2003: setting thread loses boost (lock convoy issue) Common boost values (see NTDDK. H) 1: disk, CD-ROM, parallel, Video 2: serial, network, named pipe, mailslot 6: keyboard or mouse 8: sound After any wait on a dispatcher object by a thread in the foreground process: Boost value of 2 Goal: improve responsiveness of interactive apps GUI threads that wake up to process windowing input (e. g. windows messages) get a boost of 2 This is added to the current, not base priority Goal: improve responsiveness of interactive apps 98

Priority Boost And Decay Behavior of these boosts: Boost is applied to thread’s base

Priority Boost And Decay Behavior of these boosts: Boost is applied to thread’s base priority Will not take you above priority 15 After a boost, you get one quantum Then decays 1 level, runs another quantum Then decays another level, etc. until back to base priority quantum Priority decay at quantum end Priority Base Priority Boost upon wait complete Run Wait Round-robin at base priority Run Preempt (before quantum end) Run Time 99

CPU Starvation Avoidance Balance Set Manager system thread looks for “CPU starved” threads Wakes

CPU Starvation Avoidance Balance Set Manager system thread looks for “CPU starved” threads Wakes up once per second and examines Ready queues Looks for threads that have been Ready for 300 clock ticks Such threads get a big boost to 15 and quantum is doubled At quantum end, returns to previous priority (no gradual decay) and normal quantum 12 Wait 7 Run 4 Ready To minimize overhead: Scans up to 16 Ready threads per priority level each pass Boosts up to 10 Ready threads per pass Like all priority boosts, does not apply in the real-time range (priority 16 and above) 100

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 101

Processes And Threads Data Structures Priority Spectrum Scheduling Decisions Priority Adjustments Multiprocessor Considerations 101

Multiprocessor Scheduling Fully distributed (no “master processor”) Any processor can interrupt another processor to

Multiprocessor Scheduling Fully distributed (no “master processor”) Any processor can interrupt another processor to schedule a thread Scheduling database: Pre-Windows Server 2003: single system-wide list of ready queues Windows Server 2003: per-CPU ready queues Threads can run on any CPU, unless specified otherwise Tries to keep threads on same CPU (“soft affinity”) Setting of which CPUs a thread will run on is called “hard affinity” 102

Hard Processor Affinity Threads can run on any CPU, unless affinity specified otherwise Affinity

Hard Processor Affinity Threads can run on any CPU, unless affinity specified otherwise Affinity specified by a bit mask Each bit corresponds to a CPU number Can alter with Set. Thread. Affinity. Mask or Set. Process. Affinity. Mask or in the job object Thread affinity mask must be subset of process affinity mask, which in turn must be a subset of the active processor mask “Hard Affinity” can lead to threads’ getting less CPU time than they normally would More applicable to large MP systems running dedicated server apps Note: OS may in some cases need to run your thread CPUs other than your hard affinity setting E. g. flushing DPCs, setting system time 103

Hard Processor Affinity On MP systems, the process affinity mask can be examined and

Hard Processor Affinity On MP systems, the process affinity mask can be examined and changed via Task Manager Can also set an image affinity mask Imagecfg tool in Windows 2000 Server Resource Kit Supplement 1 Can also set “uniprocessor only”: sets affinity mask to one processor (rotates round robin at each process creation) 104

Soft Processor Affinity Every thread has an “ideal processor” System selects ideal processor first

Soft Processor Affinity Every thread has an “ideal processor” System selects ideal processor first thread in process (round robin across CPUs) Next thread gets next CPU relative to the process seed Can override with: Set. Thread. Ideal. Processor ( HANDLE h. Thread, DWORD dw. Ideal. Processor); // handle to thread // processor number Hard affinity changes update ideal processor settings Used in selecting where a thread runs next (see next slides) 105

Choosing A CPU For A Ready Thread (Windows 2000) When a thread becomes ready

Choosing A CPU For A Ready Thread (Windows 2000) When a thread becomes ready to run (e. g. its wait completes, or it is just beginning execution), need to choose a processor for it to run on First, it sees if any processors are idle that are in the thread’s hard affinity mask: If its “ideal processor” is idle, it runs there If the previous processor it ran on is idle, it runs there Else if the current processor is idle, it runs there Else it picks the highest numbered idle processor in the thread’s affinity mask If no processors are idle: If the ideal processor is in the thread’s affinity mask, it selects that Else if the last processor is in the thread’s affinity mask, it selects that Else it picks the highest numbered processor in the thread’s affinity mask Finally, it compares the priority of the new thread with the priority of the thread running on the processor it selected (if any) to determine whether or not to perform a preemption 106

Selecting A Thread To Run On A CPU (Windows 2000) System needs to choose

Selecting A Thread To Run On A CPU (Windows 2000) System needs to choose a thread to run on a specific CPU at: At quantum end When a thread enters a wait state When a thread removes its current processor from its hard affinity mask When a thread exits Win 2000: With dispatcher lock held, starting with the first thread in the highest priority non-empty ready queue, it scans the queue for the first thread that has the current processor in its hard affinity mask and: Ran last on the current processor, or Has its ideal processor equal to the current processor, or Has been in its Ready queue for more than 2 quantums, or Has a priority >=24 If it cannot find such a candidate, it selects the highest priority thread that can run on the current CPU (whose hard affinity includes the current CPU) Note: this may mean going to a lower priority ready queue to find a candidate 107

Server 2003 Enhancements Idle processor selection further refined to: If a NUMA system: if

Server 2003 Enhancements Idle processor selection further refined to: If a NUMA system: if there are idle CPUs in the node containing the thread’s ideal processor, reduce to that set If a hyperthreaded system: if one of the idle processors is a physical processor with all logical processors idle, reduce to that set Then try to eliminate idle CPUs that are sleeping If thread ran last on a member of the set, pick that CPU Else pick lowest numbered CPU in remaining set 108

Server 2003 Enhancements Threads always go into the ready queue of their ideal processor

Server 2003 Enhancements Threads always go into the ready queue of their ideal processor Instead of locking the dispatcher database to look for a candidate to run, per-CPU ready queue is checked first (locks PRCB spinlock) If a thread has been selected to run on the CPU, does the context swap Else begins scan of other CPU’s ready queues looking for a thread to run This scan is done OUTSIDE the dispatcher lock Dispatcher lock still acquired to wait or unwait a thread and/or change state of a dispatcher object Bottom line: dispatcher lock is now held for a MUCH shorter time 109

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals Security Internals 110

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 111

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 111

Memory Manager Features Demand paged virtual memory Pages are read in on demand written

Memory Manager Features Demand paged virtual memory Pages are read in on demand written out when necessary (to make room for other memory needs) Provides flat virtual address space 32 -bit: 4 GB, 64 -bit: 16 Exabytes (theoretical) Shared memory with copy on write Mapped files (fundamental primitive) Provides basic support for file system cache manager 112

Virtual Address Space Allocation Virtual address space is sparse Address spaces contain reserved, committed,

Virtual Address Space Allocation Virtual address space is sparse Address spaces contain reserved, committed, and unused regions Unit of protection and usage is one page Page size can vary On x 86, default page size for applications is 4 KB On Itanium, default page size is 8 KB Large pages If a “large memory system”, large (4 MB on x 86; 16 MB on Itanium) pages are used to map the OS and HAL Disables kernel write protection New in 2003: applications can Virtual. Alloc large pages with MEM_LARGE_PAGE flag 113

Shared Memory Like most modern OSs, Windows provides a way for processes to share

Shared Memory Like most modern OSs, Windows provides a way for processes to share memory High speed IPC (used by LPC, which is used by RPC) Threads share address space, but applications may be divided into multiple processes for stability reasons Processes can also create shared memory sections Called page file backed file mapping objects Full Windows security It does this automatically for shareable pages E. g. , code pages in an. EXE 114

Mapped Files A way to take part of a file and map it to

Mapped Files A way to take part of a file and map it to a range of virtual addresses (Address space is 2 GB, but files can be much larger) Called “file mapping objects” in Windows API Bytes in the file then correspond one-for-one with bytes in the region of virtual address space Read from the “memory” fetches data from the file Pages are kept in physical memory as needed Changes to the memory are eventually written back to the file (can request explicit flush) Initial mapped files in a process include The executable image (EXE) One or more Dynamically Linked Libraries (DLLs) Processes can map additional files as desired (data files or additional DLLs) 115

Section Objects Mapped files Called “file mapping objects” in Windows API Files may be

Section Objects Mapped files Called “file mapping objects” in Windows API Files may be mapped into v. a. s. // first, do EITHER. . . h. Map. Obj = Create. File. Mapping (h. File, security, protection, size. High, size. Low, mapname); // … OR … h. Map. Obj = Open. File. Mapping (access. Mode, inheritflag, mapname); // … then, pass the resulting handle to a mapping object (section) to. . . lpvoid = Map. View. Of. File (h. Map. Obj, access. Mode, offset. High, offset. Low, cb. Map); Bytes in the file then correspond one-for-one with bytes in the region of virtual address space Read from the “memory” fetches data from the file Changes to the memory are written back to the file Pages are kept in physical memory as needed If desired, can map to only a part of the file at a time 116

Copy-On-Write Pages Used for sharing between process address spaces Pages are originally set up

Copy-On-Write Pages Used for sharing between process address spaces Pages are originally set up as shared, read-only, faulted from the common file Access violation on write attempt alerts pager Pager makes a copy of the page and allocates it privately to the process doing the write, backed to the paging file So, only need unique copies for the pages in the shared region that are actually written (example of “lazy evaluation”) Original values of data are still shared E. g. , writeable data initialized with C initializers 117

How Copy-On-Write Works Before Orig. Data Page 1 Orig. Data Page 2 Page 3

How Copy-On-Write Works Before Orig. Data Page 1 Orig. Data Page 2 Page 3 Process Address Space Physical memory Process Address Space 118

How Copy-On-Write Works After Orig. Data Page 1 Mod’d. Data Page 2 Page 3

How Copy-On-Write Works After Orig. Data Page 1 Mod’d. Data Page 2 Page 3 Process Address Space Copy of page 2 Physical memory Process Address Space 119

Physical Memory 32 -bit Windows supports systems with 64 GB physical memory But, the

Physical Memory 32 -bit Windows supports systems with 64 GB physical memory But, the virtual address space is still 4 GB, so how can this memory be used? 1. 2. 3. Although each process can only address 2 (or 3) GB, many may be in memory at the same time (e. g. , 5 * 2 GB processes = 10 GB) New Address Windowing Extensions allow Win 32 processes to use more than 2 GB of memory Files in system cache remain in physical memory Although file cache doesn’t know it, memory manager keeps unmapped data in physical memory 120

Address Windowing Extensions AWE functions allow Win 32 processes to allocate large amounts of

Address Windowing Extensions AWE functions allow Win 32 processes to allocate large amounts of physical memory and then map “windows” into that memory Applications: Database servers can cache large databases Up to programmer to control Like DOS enhanced memory (EMS) with more bits… 64 -bit Windows removes this need 121

File System Virtual Block Cache Virtual block cache (not logical block) Managed in terms

File System Virtual Block Cache Virtual block cache (not logical block) Managed in terms of blocks within files, not blocks within partition Caching occurs above file system, not below Advantages Permits access to cached data without translation of file to sector Allows maintaining coherency between normal file I/O and memory mapped file I/O Intelligent read-ahead Predicts next read location based on history of last 2 reads Shared by all file systems Local or remote Includes file data and file system metadata (e. g. MFT, file attributes, …) Write back cache Data held in memory and written later by mapped page writer system thread 122

Cache Virtual Structure Virtual size: 64 -960 mb In system virtual address space, so

Cache Virtual Structure Virtual size: 64 -960 mb In system virtual address space, so visible to all processes Divided into 256 kb “views” Cache slots are mapped to 256 kb segments of cached files Uses same services as Win 32 memory mapped files But remember, this is virtual, not physical Relies on memory manager to read and write actual file data via normal paging Virtual size of the cache is not related to amount of cached file data Memory manager will still “cache” unmapped file data on the standby list So larger cache size just reduces # of mapping/unmappings 123

Controlling The Cache Per-file basis File open flags affect how cache influences the memory

Controlling The Cache Per-file basis File open flags affect how cache influences the memory manager on what data to keep in RAM If nothing specified, automatic asynchronous read-ahead Predicts next read location based on history of last 2 reads Touches the pages to fault them in FILE_FLAG_SEQUENTIAL increases size of read-ahead And, causes cache to re-use same cache slot (instead of filling cache) Also puts unmapped pages at end of standby list FILE_FLAG_RANDOM_ACCESS disables read ahead Can disable file cache completely on a per-file open basis Create. File with FILE_FLAG_NO_BUFFERING Requires reads/writes to be done on sector boundaries Buffers must be aligned in memory on sector boundaries 124

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 125

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 125

Working Set Working set: All the physical pages “owned” by a process Essentially, all

Working Set Working set: All the physical pages “owned” by a process Essentially, all the pages the process can reference without incurring a page fault Working set limit: The maximum pages the process can own When limit is reached, a page must be released for every page that’s brought in (“working set replacement”) Default upper limit on size for each process System-wide maximum calculated and stored in Mm. Maximum. Working. Set. Size Approximately RAM minus 512 pages (2 MB on x 86) minus min size of system working set (1. 5 MB on x 86) Interesting to view (gives you an idea how much memory you’ve “lost” to the OS) True upper limit: 2 GB minus 64 MB 126

Birth Of A Working Set Pages are brought into memory as a result of

Birth Of A Working Set Pages are brought into memory as a result of page faults Prior to Windows XP, no pre-fetching at image startup But readahead is performed after a fault See Mm. Code. Cluster. Size, Mm. Data. Cluster. Size, Mm. Read. Cluster. Size Can see with Filemon If the page is not in memory, the appropriate block in the associated file is read in Physical page is allocated Block is read into the physical page Page table entry is filled in Exception is dismissed Processor re-executes the instruction that caused the page fault (and this time, it succeeds) The page has now been “faulted into” the process “working set” 127

Working Set List newer pages older pages Perf. Mon Process “Working. Set” A process

Working Set List newer pages older pages Perf. Mon Process “Working. Set” A process always starts with an empty working set It then incurs page faults when referencing a page that isn’t in its working set Many page faults may be resolved from memory (to be described later) 128

Working Set Replacement Perf. Mon Process “Working. Set” When working set max reached (or

Working Set Replacement Perf. Mon Process “Working. Set” When working set max reached (or working set trim occurs), must give up pages to make room for new pages Local page replacement policy (most Unix systems implement global replacement) To standby or modified page list E. g. a single process cannot take over all of physical memory Page replacement algorithm is least recently accessed (pages are aged) On UP systems only in Windows 2000 – done on all systems in Windows XP/2003 New Virtual. Alloc flag in XP/2003: MEM_WRITE_WATCH 129

Working Set System Services Min/Max set on a per-process basis Can view with !process

Working Set System Services Min/Max set on a per-process basis Can view with !process in Kernel Debugger Can adjust with Set. Process. Working. Set. Size – but has little effect Limits are “soft” (many processes larger than max) Memory Manager decides when to grow/shink working sets New function in 2003 Server: Set. Process. Working. Set. Size. Ex Supports hard working set limits Can also self-initiate working set trimming Pass -1, -1 as min/max working set size (minimizing a window does this for you) 130

Locking Pages may be locked into the process working set Pages are guaranteed in

Locking Pages may be locked into the process working set Pages are guaranteed in physical memory (“resident”) when any thread in process is executing Windows: status = Virtual. Lock(base. Address, size); status = Virtual. Unlock(base. Address, size); Number of lockable pages is a fraction of the maximum working set size Changed by Set. Process. Working. Set. Size Pages can be locked into physical memory (by kernel mode code only) Pages are then immune from “outswapping” as well as paging Mm. Probe. And. Lock. Pages 131

Process Memory Information Task Manager. Processes tab 1 “Mem Usage” = physical memory used

Process Memory Information Task Manager. Processes tab 1 “Mem Usage” = physical memory used by process (working set size, not working set limit) Ø Note: Shared pages are counted in each process l 2 “VM Size” = private (not shared) committed virtual space in processes == potential pagefile usage l 3 “Mem Usage” in status bar is not total of “Mem Usage” column (see later slide) 2 l 1 3 Screen snapshot from: Task Manager | Processes tab 132

Process Memory Information Perf. Mon – Process Object “Virtual Bytes” = committed + reserved

Process Memory Information Perf. Mon – Process Object “Virtual Bytes” = committed + reserved virtual space, including shared pages “Working Set” = working set size (not limit) (physical) “Private Bytes” = private virtual space (same as “VM Size” from Task Manager Processes list) Also: In Threads object, look for threads in Transition state - evidence of swapping (usually caused by severe memory pressure) Screen snapshot from: Performance Monitor counters from Process object 133

Viewing The Working Set Working set size counts shared pages in each working set

Viewing The Working Set Working set size counts shared pages in each working set Vadump (Resource Kit) can dump the breakdown of private, shareable, and shared pages C: > Vadump –o –p 3968 Module Working Set Contributions in pages Total Private Shareable Shared Module 14 3 11 0 NOTEPAD. EXE 46 3 0 43 ntdll. dll 36 1 0 35 kernel 32. dll 7 2 0 5 comdlg 32. dll 17 2 0 15 SHLWAPI. dll 44 4 0 40 msvcrt. dll 134

Prefetch Mechanism File activity is traced and used to prefetch data the next time

Prefetch Mechanism File activity is traced and used to prefetch data the next time First 10 seconds are monitored Pages referenced & directories opened Prefetch “trace file” stored in WindowPrefetch Name of. EXE-<hash of full path>. pf Also applies to system boot First 2 minutes of boot process logged Stops 30 seconds after the user starts the shell or 60 seconds after all services are started Boot trace file: NTOSBOOT-B 00 DFAAD. pf 135

Prefetch Mechanism When application run again, system automatically Reads in directories referenced Reads in

Prefetch Mechanism When application run again, system automatically Reads in directories referenced Reads in code and file data Reads are asynchronous But waits for all prefetch to complete In addition, every 3 days, system automatically defrags files involved in each application startup Bottom line: Reduces disk head seeks This was seen to be the major factor in slow application/system startup 136

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 137

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 137

Managing Physical Memory System keeps unassigned physical pages on one of several lists Free

Managing Physical Memory System keeps unassigned physical pages on one of several lists Free page list Modified page list Standby page list Zero page list Bad page list – pages that failed memory test at system startup Lists are implemented by entries in the “PFN database” Maintained as FIFO lists or queues 138

Paging Dynamics demand zero page faults page read from disk or kernel allocations Standby

Paging Dynamics demand zero page faults page read from disk or kernel allocations Standby Page List Working Sets “global valid” faults “soft” page faults working set replacement modified page writer Free Page List zero page thread Zero Page List Bad Page List Modified Page List Private pages at process exit 139

Standby And Modified Page Lists Modified pages go to modified (dirty) list Avoids writing

Standby And Modified Page Lists Modified pages go to modified (dirty) list Avoids writing pages back to disk too soon Unmodified pages go to standby (clean) list They form a system-wide cache of “pages likely to be needed again” Pages can be faulted back into a process from the standby and modified page list These are counted as page faults, but not page reads 140

Modified Page Writer Moves pages from modified to standby list, and copies their contents

Modified Page Writer Moves pages from modified to standby list, and copies their contents to disk I. e. , this is what writes the paging file and updates mapped files (including the file system cache) Two system threads One for mapped files, one for the paging file Triggered when Memory is over-committed (too few free pages) Or modified page threshold is reached Does not flush entire modified page list 141

Free And Zero Page Lists Free Page List Used for page reads Private modified

Free And Zero Page Lists Free Page List Used for page reads Private modified pages go here on process exit Pages contain junk in them (e. g. , not zeroed) On most busy systems, this is empty Zero Page List Used to satisfy demand zero page faults References to private pages that have not been created yet When free page list has 8 or more pages, a priority zero thread is awoken to zero them On most busy systems, this is empty too 142

Memory Management Information Task Manager Performance tab 6 “Available” = sum of free, standby,

Memory Management Information Task Manager Performance tab 6 “Available” = sum of free, standby, and zero page lists (physical) Majority are likely standby pages “System Cache” = size of standby list + size of system working set (file cache, paged pool, pageable OS/driver code & data) 6 Screen snapshot from: Task Manager | Performance tab 143

PFN Database Only way to get actual size of physical memory lists is to

PFN Database Only way to get actual size of physical memory lists is to use !memusage in Kernel Debugger lkd> !memusage loading PFN database Zeroed: 0 Free: 3 Standby: 98248 Modified: 563 Modified. No. Write: 0 Active/Valid: 93437 Transition: 1 Unknown: 0 TOTAL: 192252 ( 0 ( 12 (392992 ( 2252 ( 0 (373748 ( 4 ( 0 (769008 kb) kb) kb) Screen snapshot from: kernel debugger 144 !memusage command

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 145

Memory Management Core Memory Management Services Working Set Management Unassigned Memory Page Files 145

Page Files What gets sent to the paging file? Not code – only modified

Page Files What gets sent to the paging file? Not code – only modified data (code can be re-read from image file anytime) When do pages get paged out? Only when necessary Page file space is only reserved at the time pages are written out Once a page is written to the paging file, the space is occupied until the memory is deleted (e. g. , at process exit), even if the page is read back from disk Can run with no paging file Windows NT 4/Windows 2000: Zero pagefile size actually created a 20 MB temporary page file (temppf. sys) 146

Sizing The Page File Given understanding of page file usage, how big should the

Sizing The Page File Given understanding of page file usage, how big should the total paging file space be? (Windows supports multiple paging files) Size should depend on total private virtual memory used by applications and drivers Therefore, not related to RAM size (except for taking a full memory dump) Worst case: Windows has to page all private data out to make room for code pages To handle, minimum size should be the maximum of VM usage (“Commit Charge Peak”) Hard disk space is cheap, so why not double this Normally, make maximum size same as minimum But, max size could be much larger if there will be infrequent demands for large amounts of page file space Performance problem: Page file extension will likely be very fragmented Extension is deleted on reboot, thus returning to a contiguous page file 147

Memory Management Information Task Manager Performance tab 3 4 Total committed private virtual memory

Memory Management Information Task Manager Performance tab 3 4 Total committed private virtual memory (total of “VM Size” in process tab + Kernel Memory Paged) Not all of this space has actually been used in the paging files; it is “how much would be used if it was 3 all paged out” “Commit charge limit” = sum of physical memory available for processes + current total size of paging file(s) 3 Does not reflect true maximum 4 page file sizes (expansion) When “total” reaches “limit”, further Virtual. Alloc attempts by any process will fail 3 Screen snapshot from: Task Manager | Performance tab 4 148

When Page Files Are Full When page file space runs low 1. “System running

When Page Files Are Full When page file space runs low 1. “System running low on virtual memory” First time: Before pagefile expansion Second time: When committed bytes reaching commit limit 2. “System out of virtual memory” Page files are full Look for who is consuming pagefile space Process memory leak: Check Task Manager, Processes tab, VM Size column Or Perfmon “private bytes”, same counter Paged pool leak: Check paged pool size Run poolmon to see what object(s) are filling pool Could be a result of processes not closing handles – check process “handle count” in Task Manager 149

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals

Outline 1. 2. 3. 4. System Architecture Processes and Thread Internals Memory Management Internals Security Internals 150

Security Introduction Components Logon Protecting Objects Privileges 151

Security Introduction Components Logon Protecting Objects Privileges 151

Windows Security Support Microsoft’s goal was to achieve C 2, which requires: Secure Logon:

Windows Security Support Microsoft’s goal was to achieve C 2, which requires: Secure Logon: NT provides this by requiring user name and password Discretionary Access Control: fine grained protection over resources by user/group Security Auditing: ability to save a trail of important security events, such as access or attempted access of a resource Object reuse protection: must initialize physical resources that are reused e. g. memory, files Certifications achieved: Windows NT 3. 5 (workstation and server) with SP 3 earned C 2 in July 1995 In March 1999 Windows NT 4 with SP 3 earned e 3 rating from UK’s Information Technology Security (ITSEC) – equivalent to C 2 In November 1999 NT 4 with SP 6 a earned C 2 in stand-alone and networked environments 152

Windows Security Support Windows meets two B-level requirements: Trusted Path Functionality: way to prevent

Windows Security Support Windows meets two B-level requirements: Trusted Path Functionality: way to prevent trojan horses with “secure attention sequence” (SAS) - Ctrl. Alt-Del Trusted Facility Management: ability to assign different roles to different accounts Windows does this through account privileges (TBD later) 153

Common Criteria New standard, called Common Criteria (CC), is the new standard for software

Common Criteria New standard, called Common Criteria (CC), is the new standard for software and OS ratings Consortium of US, UK, Germany, France, Canada, and the Netherlands in 1996 Became ISO standard 15408 in 1999 For more information, see http: //www. commoncriteriaportal. org/ and http: //csrc. nist. gov/cc CC is more flexible than TCSEC trust ratings Protection Profile collects security requirements Security Target (ST) are security requirements that can be made by reference to a PP Windows 2000 was certified as compliant with the CC Controlled Access Protection Profile (CAPP) in October 2002 Windows XP and Server 2003 are undergoing evaluation 154

Security Introduction Components Logon Protecting Objects Privileges 155

Security Introduction Components Logon Protecting Objects Privileges 155

Security Components Win. Logon MSGINA LSASS LSA Policy User Mode Net. Logon Active Directory

Security Components Win. Logon MSGINA LSASS LSA Policy User Mode Net. Logon Active Directory LSA Server SAM Server Event Logger Active Directory MSVC 1_0. dl Kerberos. dll SAM System Threads Kernel Mode System Service Dispatcher (kernel mode callable interfaces) Local Procedure Call Configuration Mgr (registry) Processes & Threads Virtual Memory Security Reference Monitor Power Mgr. Object Mgr. File System Cache Device & File Sys. Drivers Plug and Play Mgr. I/O Mgr Windows USER, GDI Graphics Drivers Kernel Hardware Abstraction Layer (HAL) Ntos. Krnl. Exe hardware interfaces (buses, I/O devices, interrupts, interval timers, DMA, memory cache control, etc. ) 156 Original copyright by Microsoft Corporation. Used by permission.

Security Reference Monitor Performs object access checks, manipulates privileges, and generates audit messages Group

Security Reference Monitor Performs object access checks, manipulates privileges, and generates audit messages Group of functions in Ntoskrnl. exe Some documented in DDK Exposed to user mode by Windows API calls Demo: Open Ntoskrnl. exe with Dependency Walker and view functions starting with “Se” 157

Demo: Viewing Security Processes Run Process Explorer Collapse Explorer process tree and focus on

Demo: Viewing Security Processes Run Process Explorer Collapse Explorer process tree and focus on upper half (system processes) 158

Security Components Local Security Authority User-mode process (WindowsSystem 32Lsass. exe) that implements policies (e.

Security Components Local Security Authority User-mode process (WindowsSystem 32Lsass. exe) that implements policies (e. g. password, logon), authentication, and sending audit records to the security event log LSASS policy database: registry key HKLMSECURITY Win. Logon MSGINA LSA Policy LSASS Net. Logon Active Directory LSA Server SAM Server Event Logger Active Directory MSVC 1_0. dl Kerberos. dll SAM 159

LSASS Components SAM Service A set of subroutines (WindowsSystem 32Samsrv. dll ) responsible for

LSASS Components SAM Service A set of subroutines (WindowsSystem 32Samsrv. dll ) responsible for managing the database that contains the usernames and groups defined on the local machine SAM database: A database that contains the defined local users and groups, along with their passwords and other attributes. This database is stored in the registry under HKLMSAM. Password crackers attack the local user account password hashes stored in the SAM Demo: look at SAM service Open Lsass. exe process properties – click on services tab Click Find DLL – search for Samsrv. dll 160

Demo: Looking at the SAM Look at HKLMSAM permissions SAM security allows only the

Demo: Looking at the SAM Look at HKLMSAM permissions SAM security allows only the local system account to access it Run Regedit Look at HKLMSAM - nothing there? Check permissions (right click->Permissions) Close Regedit Look in HKLMSAM Running Regedit in the local system account allows you to view the SAM: psexec –s –i –d c: windowsregedit. exe or sc create cmdassystem type= own type= interact binpath= "cmd /c start cmd /k“ sc start cmdassystem View local usernames under HKLMSAMDomainsAccountUsersNames Passwords are under Users key above Names 161

LSASS Components Active Directory A directory service that contains a database that stores information

LSASS Components Active Directory A directory service that contains a database that stores information about objects in a domain A domain is a collection of computers and their associated security groups that are managed as a single entity The Active Directory server, implemented as a service, WindowsSystem 32Ntdsa. dll, that runs in the Lsass process Authentication packages DLLs that run in the context of the Lsass process and that implement Windows authentication policy: Lan. Man: WindowsSystem 32Msv 1_0. dll Kerberos: WindowsSystem 32Kerberos. dll Negotiate: uses Lan. Man or Kerberos, depending on which is most appropriate 162

LSASS Components Net Logon service (Netlogon) A Windows service (WindowsSystem 32Netlogon. dll) that runs

LSASS Components Net Logon service (Netlogon) A Windows service (WindowsSystem 32Netlogon. dll) that runs inside Lsass and responds to Microsoft LAN Manager 2 Windows NT (pre-Windows 2000) network logon requests Authentication is handled as local logons are, by sending them to Lsass for verification Netlogon also has a locator service built into it for locating domain controllers Win. Logon MSGINA LSA Policy LSASS Net. Logon Active Directory LSA Server SAM Server Event Logger Active Directory MSVC 1_0. dl Kerberos. dll SAM 163

Winlogon Logon process (Winlogon) A user-mode process running WindowsSystem 32Winlogon. exe that is responsible

Winlogon Logon process (Winlogon) A user-mode process running WindowsSystem 32Winlogon. exe that is responsible for responding to the SAS and for managing interactive logon sessions Graphical Identification and Authentication (GINA) A user-mode DLL that runs in the Winlogon process and that Winlogon uses to obtain a user's name and password or smart card PIN Default is WindowsSystem 32Msgina. dll Win. Logon MSGINA LSA Policy LSASS Net. Logon Active Directory LSA Server SAM Server Event Logger Active Directory MSVC 1_0. dl Kerberos. dll SAM 164

Security Introduction Components Logon Protecting Objects Privileges 165

Security Introduction Components Logon Protecting Objects Privileges 165

What Makes Logon Secure? Before anyone logs on, the visible desktop is Winlogon’s Winlogon

What Makes Logon Secure? Before anyone logs on, the visible desktop is Winlogon’s Winlogon registers CTRL+ALT+DEL, the Secure Attention Sequence (SAS), as a standard hotkey sequence SAS takes you to the Winlogon desktop No application can deregister it because only the thread that registers a hotkey can deregister it When Windows’ keyboard input processing code sees SAS it disables keyboard hooks so that no one can intercept it 166

Logon After getting security identification (account name, password), the GINA sends it to the

Logon After getting security identification (account name, password), the GINA sends it to the Local Security Authority Sub System (LSASS) LSASS calls an authentication package to verify the logon If the logon is local or to a legacy domain, MSV 1_0 is the authenticator. User name and password are encrypted and compared against the Security Accounts Manager (SAM) database Cached domain logons are also handled by MSV 1_0 If the logon is to a AD domain the authenticator is Kerberos, which communicates with the AD service on a domain controller If there is a match, the SIDs of the corresponding user account and its groups are retrieved Finally, LSASS retrieves account privileges from the Security database or from AD 167

Logon LSASS creates a token for your logon session and Winlogon attaches it to

Logon LSASS creates a token for your logon session and Winlogon attaches it to the first process of your session Tokens are created with the Nt. Create. Token API Every process gets a copy of its parent’s token SIDs and privileges cannot be added to a token A logon session is active as long as there is at least one token associated with the session Lab Run “Logon. Sessions –p” (from Sysinternals) to view the active logon sessions on your system 168

Security Introduction Components Logon Protecting Objects Privileges 169

Security Introduction Components Logon Protecting Objects Privileges 169

The Access Validation Algorithm Access validation is a security equation that takes three inputs:

The Access Validation Algorithm Access validation is a security equation that takes three inputs: Desired Access Process Token Or Thread’s token if the thread is “impersonating” The object’s Security Descriptor, which contains a Discretionary Access Control List (DACL) The output is access allowed or access denied 170

Tokens The main components of a token are: SID of the user SIDs of

Tokens The main components of a token are: SID of the user SIDs of groups the user account belongs to Privileges assigned to the user (described in next section) Account SID Group 1 SID Group n SID Privilege 1 171

Labs: Viewing Access Tokens Process Explorer: double click on a process and go to

Labs: Viewing Access Tokens Process Explorer: double click on a process and go to Security tab Examine groups list Use RUNAS to create a CMD process running under another account (e. g. your domain account) Examine groups list Viewing tokens with the Kernel Debugger Run !process 0 0 to find a process Run !process <PID> 1 to dump the process Get the token address and type !token –n <token address> Type dt _token <token address> to see all fields defined in a token 172

Impersonation Lets an application adopt the security profile another user Used by server applications

Impersonation Lets an application adopt the security profile another user Used by server applications Impersonation is implemented at the thread level The process token is the “primary token” and is always accessible Each thread can be impersonating a different client Can impersonate with a number of client/server networking APIs – named pipes, RPC, DCOM Client Process Server Process Object Server Threads 173

Process And Thread Security Structures 1 ACL 5 Process Access Token User’s SID ACL

Process And Thread Security Structures 1 ACL 5 Process Access Token User’s SID ACL 3 Group SIDs Privileges Owner SID Primary Group SID Default ACL 2 ACL Thread 1 Thread 2 4 ACL 6 Thread 3 Access Token User’s SID Group SIDs Privileges Owner SID Primary Group SID Default ACL Thread tokens (where present) completely supersede process token (basis for “security impersonation”) 174

SIDs Windows uses Security Identifers (SIDs) to identify security principles: Users, Groups of users,

SIDs Windows uses Security Identifers (SIDs) to identify security principles: Users, Groups of users, Computers, Domains SIDs consist of: A revision level e. g. 1 An identifier-authority value e. g. 5 (SECURITY_NT_AUTHORITY) One or more subauthority values Who assigns SIDs? Setup assigns a computer a SID Dcpromo assigns a domain a SID Users and groups on the local machine are assigned SIDs that are rooted with the computer SID, with a Relative Identifier (RID) at the end RIDs start at 1000 (built-in account RIDs are pre-defined) Some local users and groups have pre-defined SIDs (eg. World = S-1 -1 -0) 175

Demo: SIDs Example SIDs Domain SID: S-1 -5 -21 -34125455 -5125555 -1251255 First account:

Demo: SIDs Example SIDs Domain SID: S-1 -5 -21 -34125455 -5125555 -1251255 First account: S-1 -5 -21 -34125455 -5125555 -1251255 -1000 Admin account: S-1 -5 -21 -34125455 -5125555 -1251255 -500 System account: S-1 -5 -18 Demo: run Ps. Get. Sid (Sysinternals) to view the SID of your username and of the computer 176

Security Descriptors are associated with objects: e. g. files, Registry keys, application-defined Descriptors are

Security Descriptors are associated with objects: e. g. files, Registry keys, application-defined Descriptors are variable length Owner SID Defined for POSIX Primary Group DACL pointer SACL pointer DACL SACL 177

DACLs consist of zero or more Access Control Entries A security descriptor with no

DACLs consist of zero or more Access Control Entries A security descriptor with no DACL allows all access A security descriptor with an empty (0 -entry) DACL denies everybody all access An ACE is either “allow” or “deny” ACE Type SID Access Mask Read, Write, Delete, . . . 178

Demo: Viewing a Security Descriptor Structure Get the address of an EPROCESS block with

Demo: Viewing a Security Descriptor Structure Get the address of an EPROCESS block with !process Type !object on that address Type “dt _OBJECT_HEADER” on the object header address to get the security descriptor address Type !sd <security descriptor address> & -8 1 179

Access Check The Security Reference Monitor (SRM) implements an explicit allow model ACEs in

Access Check The Security Reference Monitor (SRM) implements an explicit allow model ACEs in the DACL are examined in order Does the ACE have a SID matching a SID in the token? If so, do any of the access bits match any remaining desired accesses? If so, what type of ACE is it? Deny: return ACCESS_DENIED Allow: grant the specified accesses and if there are no remaining accesses to grant, return ACCESS_ALLOWED If we get to the end of the DACL and there are remaining desired accesses, return ACCESS_DENIED 180

Access Check Example Token Mark Access Request Authors Write Developers Privilege 1 Privilege n

Access Check Example Token Mark Access Request Authors Write Developers Privilege 1 Privilege n DACL Deny Object Authors Read Allow Mark All 181

ACE Ordering The order of ACEs is important! Low-level security APIs allow the creation

ACE Ordering The order of ACEs is important! Low-level security APIs allow the creation of DACLs with ACEs in any order All security editor interfaces and higher-level APIs order ACEs with denies before allows Example: Token Mark Authors Developers DACL Privilege 1 DACL Deny Privilege n Allow Authors Read Allow Mark All Mark Access Request Read All Deny Authors Read 182

Demo: ACE ordering Go to a NTFS file Add an Everyone deny-all to a

Demo: ACE ordering Go to a NTFS file Add an Everyone deny-all to a file Will the Administrator be able to look at the file? Verify your answer by checking Effective Permissions 183

Access Special Cases An object’s owner can always open an object with WRITE_DACL and

Access Special Cases An object’s owner can always open an object with WRITE_DACL and READ_CONTROL permission An account with “take ownership” privilege can claim ownership of any object An account with backup privilege can open any file for reading An account with restore privilege can open any file for write access 184

Controllable Inheritance In NT 4. 0, objects only inherit ACEs from a parent container

Controllable Inheritance In NT 4. 0, objects only inherit ACEs from a parent container (e. g. Registry key or directory) when they are created No distinction made between inherited and noninherited ACES No prevention of inheritance In Windows 2000 and higher inheritance is controllable Set. Named. Security. Info. Ex and Set. Security. Info. Ex Will apply new inheritable ACEs to all child objects (subkeys, files) Directly applied ACEs take precedence over inherited ACEs 185

Security Introduction Components Logon Protecting Objects Privileges 186

Security Introduction Components Logon Protecting Objects Privileges 186

Privileges Specify which system actions a process (or thread) can perform Privileges are associated

Privileges Specify which system actions a process (or thread) can perform Privileges are associated with groups and user accounts There are sets of pre-defined privileges associated with built-in groups (e. g. System, Administrators) Examples include: Backup/Restore Shutdown Debug Take ownership Privileges are disabled by default and must be programmatically turned on with a system call 187

Demo: Privileges Run Secpol. msc and examine full list Click on Local Policies->User Rights

Demo: Privileges Run Secpol. msc and examine full list Click on Local Policies->User Rights assignment Process Explorer: double click on a process, go to security tab, and examine privileges list Watch changes to privilege list: 1. 2. 3. 4. 5. Run Process Explorer – put in paused mode Open Control Panel applet to change system time Go back to Process Explorer & press F 5 Examine privilege list in new process that was created Notice in privilege list that system time privilege is enabled 188

Powerful Privileges There are several privileges that gives an account that has them full

Powerful Privileges There are several privileges that gives an account that has them full control of a computer: Debug: can open any process, including System processes to Inject code Modify code Read sensitive data Take Ownership: can access any object on the system Replace system files Change security Restore: can replace any file Load Drivers bypass all security Create Token Can spoof any user (locally) Requires use of undocumented NT API Trusted Computer Base (Act as Part of Operating System) Can create a new logon session with arbitrary SIDs in the token 189

Demo: Powerful Privileges View the use of the backup privilege: Make a directory Create

Demo: Powerful Privileges View the use of the backup privilege: Make a directory Create a file in the directory Use the security editor to remove inherited security and give Everyone full access to the file Remove all access to the directory (do not propagate) Start a command-prompt and do a “dir” of the directory Run SysintSolomonPView and enable the Backup privilege for the command prompt Do another “dir” and note the different behavior View the use of the Bypass-Traverse Checking privilege (internally called “Change Notify”) From the same command prompt run notepad to open the file (give the full path) in the inaccessible directory Extra credit: disable Bypass-Traverse Checking so that you get access denied trying to open the file (hint: requires use of secpol. msc and then RUNAS) 190

The End! Thanks for coming! For more information: Windows Internals, 4 th edition 5

The End! Thanks for coming! For more information: Windows Internals, 4 th edition 5 th edition will be updated for Vista (will ship when Vista ships ) We’ll stay for questions (we’re not here the rest of the week ) Or, email us (see slide 1 for addresses) 191

© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.

© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 192